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Abstract. We present two theoretically interesting and empirically suc- 
cessful techniques for improving the linear programming approaches, 
namely graph transformation and local cuts, in the context of the Steiner 
problem. We show the impact of these techniques on the solution of the 
largest benchmark instances ever solved. 



1 Introduction 

In combinatorial optimization many algorithms are based (explicitly or implic- 
itly) on linear programming approaches. A typical application of linear pro- 
gramming to optimization problems works as follows: First, the combinatorial 
problem is reformulated as an integer linear program. Then, some integrality 
constraints are relaxed and one of the numerous methods for solving (or approx- 
imating) a linear program is applied. For AfT^-hard optimization problems, any 
linear relaxation of polynomial size (and any polynomial time solvable relax- 
ation) is bound to have an integrality gap (unless V = NV). So the quality of 
the underlying relaxation can have a decisive impact on the performance of the 
overall algorithm. As a consequence, methods for generating tight lower bounds 
are significant contributions to elaborated algorithms for combinatorial opti- 
mization problems, see for example the long history of research for the Traveling 
Salesman Problem (TSP) focusing on linear programming [3, 4, 14, 15]. 

In this work, we improve the linear programming based techniques for the 
Steiner tree problem in networks, which is the problem of connecting a given 
subset of the vertices of a weighted graph at minimum cost. It is a classical NV- 
hard problem [12] with many important applications in network design in general 
and VLSI design in particular. For background information on this problem, 
see [6, 11]. 

For the Steiner problem, linear programming approaches are particularly 
important, since the best known practical algorithms for optimal solutions, for 
heuristic Steiner trees, and for preprocessing techniques, which reduce the size of 
the problem instance without changing an optimal solution, all make frequent use 



K. Jansen et al. (Eds.): WEA 2003, LNCS 2647, pp. 1—14, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 



2 



Ernst Althaus et al. 



of linear programming techniques [17, 18, 19, 20]. Typical situations where linear 
programming is used are the computation of lower bounds in the context of an 
exact algorithm, bound-based reduction techniques [17], and partitioning-based 
reduction techniques [18]. Especially for large and complex problem instances, 
very small differences in the integrality gap can cause an enormous additional 
computational effort in the context of an exact algorithm. Therefore, methods 
for improving the quality of the lower bounds are very important. 

In Section 2, we give some definitions, including the directed cut relaxation, 
which is the basis for many linear programming approaches for the Steiner prob- 
lem. Then, we will present two approaches for improving the lower bound pro- 
vided by this relaxation: 

~ In Section 3, we introduce the “vertex splitting” technique: We identify lo- 
cations in the network that contribute to the integrality gap and split up 
the decisive vertices in these locations. Thereby, we transform the problem 
instance into one that is equivalent with respect to the integral solution, but 
the solution of the relaxation may improve. 

This idea is inspired by the column replacement techniques that were in- 
troduced by Balas and Padberg [-5] and generalized by Hans et. al. [10] and 
Gentile et. al. [9]. In these and other papers a general technique for solv- 
ing integer programs is developed. However, these techniques are mainly 
viewed as primal algorithms, and extensions for combinatorial optimization 
problems are presented for the Stable Set problem only. Furthermore, these 
extensions are not yet part of a practical algorithm (the general integer pro- 
gramming techniques have been applied successfully). Thus, we are the first 
to apply this basic idea in a practical algorithm for a concrete combinatorial 
optimization problem. 

— In Section 4, we show how to adopt the “local cuts” approach, introduced 
by Applegate, Bixby, Chvatal, and Cook [4] in the context of the TSP: Ad- 
ditional constraints are generated using projection, lifting and optimal solu- 
tions of subinstances of the problem. To apply this approach to the Steiner 
problem, we develop new shrinking operations and separation techniques. 

In Section 5, we embed these two approaches into our successful algorithm for 
solving Steiner tree problems and present some experimental results. Like many 
other elaborated optimization packages our program consists of many parts (the 
source code has approximately 30000 instructions, not including the LP-solver 
code). Thus, this paper describes only a small part of the whole program. How- 
ever, among other results, we will show that this part is decisive for the solution 
of the problem instance dl5112, which is to our knowledge the largest bench- 
mark Steiner tree instance ever solved. Furthermore, we believe that these new 
techniques are also interesting for other combinatorial optimization problems. 

The other parts of the program package are described in a series of papers [17, 
18, 19, 20]. Note that there is no overlapping between these papers and the work 
presented here. Some proofs in this paper had to be omitted due to the page 
constraint, they are given in [1]. 
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2 Definitions 

The Steiner problem in networks can be stated as follows (see [11] for details): 
Given an (undirected, connected) network G = {V,E,c) (with vertices V = 
{vi , . . . , Vn}, edges E and edge weights Ce > 0 for all e G E) and a set i?, 0 
R CV, of required vertices (or terminals), find a minimum weight tree in G that 
spans R (a Steiner minimal tree). If we want to stress that Vi is a terminal, we will 
write Zi instead of Vi. We also look at a reformulation of this problem using the 
(bi-)directed version of the graph, because it yields stronger relaxations: Given 
G = (y, E, c) and R, find a minimum weight arborescence in G = (V) A, c) 
{A := {[vi,Vj],[vj,Vi] I (vi,Vj) G E}, c defined accordingly) with a terminal 
(say z\) as the root that spans R^^ := R\ {zi}. 

A cut in G = (y, A, c) (or in G = (V,E,c)) is defined as a partition G = 
{ly, iy}ofy (0ciycy;y = W(JW). We use S~{W) to denote the set of arcs 
[vi,Vj] G A with Vi G W and vj G W. For simplicity, we write S~{vi) instead 
of <5“({?;i}). The sets 5'^{W) and, for the undirected version, 5{W) are defined 
similarly. 

In the integer programming formulations we use (binary) variables for 

each arc [vi,Vj] G A, indicating whether this arc is in the solution (a;[„. = 1) 

or not (a;[„._„.] = 0). For any B C A, x{B) is short for J2aeB 

For every integer program P, LP denotes the linear relaxation of P, and 
v{LP) denotes the value of an optimal solution for LP. 

Other definitions can be found in [8, 11]. 



2.1 The Directed Cut Formulation 

The directed cut formulation Pc was stated in [25]. An undirected version was 
already introduced in [2], but the directed variant yields a stronger relaxation. 

c ■ X ^ min, 

a;((5-(ty)) > 1 (2i ^iy,i?niy yf 0), (i) 

a;e{0,l}l^l. (2) 

The constraints (1) are called Steiner cut constraints. They guarantee that in 
any arc set corresponding to a feasible solution, there is a path from Z\ to any 
other terminal. 

There is a group of constraints (see for example [13]) that can make LPc 
stronger. We call them flow-balance constraints: 

x(<5“(?;i)) < a;((5+(r>i)) {viGV\R). (3) 

We denote the linear program that consists of LPc and (3) by LPc+fb- 
In [16] we gave a comprehensive overview on relaxations for the Steiner tree 
problem. 
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3 Graph Transformation: Vertex Splitting 

In this section, we describe a new technique for effectively improving the lower 
bound corresponding to the directed cut relaxation by manipulating the under- 
lying network. 

We use the property that in an optimal directed Steiner tree, each vertex 
has in-degree at most 1. Implicitly, we realize a case distinction: If an arc [vi, Vj] 
is in an optimal Steiner tree, we know that other arcs in S~(vj) cannot be in 
the tree. The only necessary operation to realize this case distinction for the 
Steiner problem is the splitting of a vertex. A vertex vj is replaced by several 
vertices f®, one for each arc [vi, Vj] entering vj. Each new vertex Vj has only one 
incoming arc and essentially the same outgoing arcs as Vj. In Figure 1, 

the splitting of vertex vj is depicted. The explanation of the figure also provides 
some intuition how splitting can be useful. In Section 3.3 we describe how we 
identify candidates for splitting. 

The splitting operation is described formally by the pseudocode below. We 
maintain an array orig that points for each vertex in the transformed network to 
the vertex in the original network that it derives from. Initially, orig[vj] = Vj for 
all Vj S V . With P{vi) we denote the longest common suffix of all paths from z\ 
to Vi after every path is translated back to the original network. The intuition 
behind this definition is that if Vi is in an optimal Steiner arborescence, P{vi) 





Fig. 1. Splitting of vertex Vj. The filled circles are terminals, z\ is the root, all 
arcs have cost 1. An optimal Steiner arborescence has value 6 in each network. 
In the left network v{LPc+fb) is 5.5 (set the x-values of the dashed arcs to 0.5 
and of [vj,Z 4 ] to 1), but 6 in the right network (again, set the x- values of the 
dashed arcs and of [f“,Z 4 ] and [Vj,Z 4 ] to 0.5). The difference is that in the left 
network, there is a situation that is called “rejoining of flows”: Flows from zi 
to Z 2 and from z± to z^ enter Vj on different arcs, but leave on the same arc, 
so they are accounted in the x variables only once. Before splitting, the a;-value 
corresponding to the arc [vj^Vc] is 0.5, after splitting the corresponding a;- values 
sum up to 1 
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must also be in the arborescence after it is translated into the original network. 
Note that the path P{vi) consists of vertices in the original network and may 
contain cycles; in this case, Vi cannot be part of an optimal arborescence. In 
Figure 1, P{va) consists of Va and P{Vj) is the path of length 1 from Va to Vj. 
To compute P{vi), one can reverse all arcs and use breadth-first-search. The 
main purpose of using P{vi) is to avoid inserting unnecessary arcs. This can 
improve the value of and the computation times for the lower bound. It is also 
necessary for the proof of termination in Section 3.2. 

For the ease of presentation, we assume that the root terminal zi has no 
incoming arcs, and that all other terminals have no outgoing arcs. If this is 
not the case, we simply add copies of the terminals and connect them with 
appropriate zero cost arcs to the old terminals. 

SPLIT- VERTEX{G,Vj,orig) : (assuming Vj ^ R) 

1 forall [vi,Vj] G S~(vj) : 

2 if P{vi) contains a cycle or orig[vj] in P{vi) : 

3 continue with next arc in 6~{vj) 

4 insert a new vertex u* into G, origlv'j] := orig[vj] 

5 insert an arc [vi,Vj] with cost c{vi,Vj) into G 

6 forall [vj,Vk] G : 

7 if orig[vk] not in P{vi) : 

8 insert an arc [u*, Ufc] with cost c(vj,Vk) into G 

9 delete Vj 

10 delete all vertices that are not reachable from Z\ 



3.1 Correctness 

In this section, we prove that the transformation is valid, i.e., it does not change 
the value of an optimal Steiner arborescence. 

Lemma 1. Any optimal Steiner arborescence with root zi in the original net- 
work can be transformed into a feasible Steiner arborescence with root zi in the 
transformed network with the same cost and vice versa. 

Proof. We consider one splitting operation on vertex Vj G V \ R, transforming 
a network G into G' . Repeating the argumentation extends the result to multiple 
splits. We use a condition (f) for a tree T denoting that for every Vk^vi in T, it 
holds: orig[vk] = orig[vi] Vk = vi. Note that condition (f) holds for an optimal 
Steiner arborescence in the original network. 

Let T be an optimal Steiner arborescence with root zi for G satisfying (f). 
If Vj ^ T, T is part of G' and we are done. If Vj G T, there is exactly one 
arc [vi,Vj] G T. When [vi,Vj] is considered in the splitting, P{vi) is a subpath 
of the path from 2 ;i to Vi in T after it is translated to the original network. 
Together with (f) follows that neither orig[vj\, nor orig[vk\ for any [vj,Vk] G T 
is in P{vi). Therefore, all arcs [vj,Vk] G T can be replaced by arcs [u* , Ufc] and the 
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arc [t;i, Vj] can be replaced by [vi^ Vj]. The transformed T is part of G' , connects 
all terminals, has the same cost as T and satisfies condition (f). 

Now, let T' be an optimal Steiner arborescence for G' . Obviously, T' can be 
transformed into a feasible solution T with no higher cost for G. 

3.2 Termination 

In this section, we show that iterating the splitting operation will terminated 

Lemma 2. For all non-terminals Vj, P{vj) is the common sujfix of all paths 
P{vi) appended by orig[vj] for all Vi, [vi,Vj] G S~{vj). 

Lemma 3. For any two non-terminals Vs and Vt,Vs yf Vt, P{vs) is not a suffix 
of P{vt). 

Lemma 4. After splitting a vertex Vj with in-degree greater than 1, for any 
newly inserted vertex it holds that P{Vj) is longer than P(vj) was before the 
split. 

Lemma 5. Repeated splitting of vertices with in-degree greater than 1 will stop 
with a network in which all non-terminals have in-degree 1. As a consequence, 
there is exactly one path from zi to Vi for all non-terminals Vi. 

3.3 Implementation Issues 

Of course, for a practical application one does not want to split all vertices, which 
could blow up the network exponentially. In a cutting plane algorithm one first 
adds violated Steiner cut or flow-balance constraints. They can be found by 
min-cut computations [17], respectively with a summation of the incoming and 
outgoing arcs variables of non-terminals. If no such constraint can be found, we 
search for good candidates for the splitting procedure, i.e., vertices where more 
than one incoming arc and at least one outgoing arc have an a;-value greater 
than zero. After splitting these vertices, the modified network will be used for 
the computation of new constraints, using the same algorithms as before. To 
represent this transformation in the linear program, we add new variables for 
the newly added arcs, and additional constraints that the a;-values for all newly 
added arcs corresponding to an original arc [vi,Vj] must sum up to 
Using this procedure the constraints calculated for the original network can still 
be used. 

4 Project, Separate, and Lift: Local Cuts 

Let S = {G,R) = (V,E,c,R) be an instance of the Steiner problem. Let ST{S) 

I F'l 

be the set of all incidence vectors of Steiner trees of S' andS^(S) = ST(S)-|-]R+ . 
The proofs (also of Lemma 6 and Lemma 10) are given in [1]. 
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Fig. 2. The feasible integer solutions are marked as dots, the fractional solution 
to separate by the cross. If we project the solutions to the line we can obtain 
a valid violated inequality and lift it back to the original space. If we project 
to the line ^ 2 j the fractional solution falls into the convex hull of the integer 
solutions and no such inequality can be found 



We call the elements of SQ{S) the Steiner graphs of S. We consider Steiner 
graphs, since Steiner graphs are invariant under the shrink operation (defined 
in Section 4.1). Note that the values a;(„. are not restricted to be integral 
or bounded. It is obvious that if the objective function is non-negative, there 
exists a minimum Steiner graph that is a Steiner tree. Thus all vertices of the 
polyhedron conv(5^(S')) are Steiner trees. Furthermore, conv(5^(S')) is full di- 
mensional if G is connected. 

From a high level view, local cuts can be described as follows. Assume we 
want to separate x* from conv(5C/(S')). Using a linear mapping (f>, we project the 
given point x* into a small-dimensional vector (j){x*) and solve the separation 
problem over conv{(f){SQ {S))). If we can find a violated inequality a ■ x > b 
that separates (j>{x*) from conv((^(5^(/S'))), we know that the linear inequality 
a-(p{x) > b separates x* from conv(iS^(S')). The method is illustrated in Figure 2. 

To make this method work, we have to choose (j> such that 

1. there is a good chance that (j){x*) ^ conv(i()(50(5'))) if x* ^ conv(5^(S')), 

2. we can solve the separation problem over conv((j){SQ (S))) efficiently and 

3. the inequalities a ■ 4>{x) > b are strong. 

We choose (j) in such a way that for every solution x € SQ{S) of our Steiner 
problem instance S, the projected ^(a;) is a Steiner graph of a small Steiner 
problem instance S"^, i.e., conv(^(5^(S'))) = conv(5^(S"^)) for an instance 
of the Steiner problem. Since our Steiner tree program package tends to be very 
efficient for solving small Steiner problem instances, we can handle the separation 
problem, as we will see in Section 4.2. 

We use iterative shrinking to obtain the linear mappings. We review the 
well-known concept of shrinking in the next section. After that, we introduce 
our separation algorithm for small Steiner graph instances. So far, we always 
assumed that we are looking at the undirected version of the Steiner problem, 
since our separation algorithm is much faster for this variant. As seen above, the 
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directed cut relaxation is stronger than the undirected variant. In Section 4.3, 
we discuss how we can use the directed formulation without solving directed 
Steiner graph instances in the separation algorithm. 

4.1 Shrinking 

We define our linear mappings as an iterative application of the following simple, 
well-known mapping, called shrinking. For the Steiner problem, shrinking was 
indroduced by Chopra and Rao [7]. 

Shrinking means to replace two vertices Va and Vb by a new vertex (va,Vb) 
and replace edges {vi, Vg) and {vi, Vb) by an edge (n^, (ua, Vb)) with value x*^^, + 

^*(vi Vb) assume = 0 if (vi,Vj) ^ E). The new vertex {va,Vb) is in 

the set of terminals R if Vg or Vb (or both) are in R. This informally defines the 
mapping </> and the instance S^. Note that for any incidence vector of a Steiner 
graph for the original problem, the new vector is the incidence vector of a Steiner 
graph in the reduced problem. Furthermore, for every Steiner graph x in S'^ 
there is a Steiner graph x S SG{S) such that (j){x) = x. Thus conv(<()(iS0(S'))) = 
conv(55(S"^)). 

Note that if we iteratively shrink a set of vertices W C V into one vertex 
{W), the obtained linear mapping is independent of the order in which we apply 
the shrinks. We denote the unique linear mapping which shrinks a subset W C V 
into one vertex by 

We have developed conditions on x* under which we can prove that (^(x*) is 
not in the convex hull of SQ{S^) if x* is not in the convex hull of SQ{S). 

Lemma 6. Let x* > 0. 

1. (edge of value 1): Let x*^^^ > 1 and W = {vg,Vb}. x* G conv{SG{S)) 

^^{x*) G conv{SG{S^'^)). 

2. (non-terminal of degree 2): Let Vg be in V\R and the vertices {v\, . . . Vk) in V 
be ordered according to their x'^, ^ ^ value (in decreasing order). Furthermore, 

let W = {ua,ui}. If ^{v 3 Va) ~ ^ conv(5^(S')) 4>^{x*) G 

conv{SG{S^'^ )). 

3. (cut of value 1): Let W be such that x*{5{W)) = 1 and $ ^ RHW ^ R. Let 
W =V\ W_x* G conv{SG{S)) ^ (j>^{x*) G conv{SG{S^'^ )) A (jF{x*) G 
conv{SG{S^'^ )). 

4- (biconnected components): Let U,W C V and Vg G V be such that UUW = 
V, U nW = {uq} and = 0 for all Vk G U \ {ua} and vi G W \ 

{ua}. Furthermore, let % ^ R f\W ^ R. x* G conv(5^(5')) (fP {x*) G 

conv(SG(R^^)) A 4>^(x*) G conv(SG(R^'^))- 

5. (triconnected components): Let U,W CV and Vg,Vb G V be such that U U 
W = V \ {ua}, U nW = {ub} and = 0 for all Vk G U \ {ub} 

and vi G W \ {ub}- Let furthermore x*(S(vg)) = 1 and Vg,Vb G R. x* G 
conv(50(S')) <t4> (jP {x*) G coyw{SG{S'^^ )) A ejP' {x*) G conv(SG(S'^'^)). 
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Applying these “exact” shrinks does not project the solution of the current 
linear program into the projected convex hull of all integer solutions, i.e., if 
the solution of the current linear program has not reached the value of the 
integer optimum, we can find a valid, violated constraint in the shrunken graphs. 
Unfortunately, in many cases the graphs are still too large after applying these 
shrinks and we have to apply some “heuristic” shrinks afterwards. 

In the implementation, we use a parameter max-component-size, which is ini- 
tially 15. If the number of vertices in a graph after applying all “exact” shrinks is 
not higher than max-component-size, we start FIND-FACET (see Section 4.2), 
otherwise, we start a breadth-first-search from different starting positions, shrink 
everything except the first max- component- size vertices visited by the BFS, try 
the “exact” shrinks again and start FIND-FACET. If it turns out that we could 
not find a valid, violated constraint, we increase max- component- size. We also 
tried other “heuristic” shrinks by relaxing “exact” shrinks, e.g., accepting mini- 
mum Steiner cuts with value above 1, or edges that have an a;- value close to 1. 
But we could not come up with a definitive conclusion which shrinks are best, 
and we believe that there is still room for improvement. 

As we will see in the next section, our separation algorithm finds a facet 
of conv(5t/(S“^)). As shown in Theorem 4.1 of [7], the lifted inequality is then 
a facet of conv(5^(S')). 

4.2 Separation: Finding Facets 

Assume we want to separate x* from conv(5^(S')). Note that we actually sep- 
arate 4>{x*) from conv(5C/(S'‘^)), but this problem can be solved with the same 
algorithm. 

As we will see, the separation problem can be formulated as a linear program 
with a row for every Steiner graph. Trying to solve this linear program using 
cutting planes, we have the problem that the number of Steiner graphs (contrary 
to the case of Steiner trees) is infinite and optimal Steiner graphs need not exist. 
Note that the same complication arises when applying local cuts to the Traveling 
Salesman Problem. 

The solution for the separation problem is much simpler and more elegant 
for the Steiner tree case than for the Traveling Salesman case. The key is the 
following Lemma, a slight variation of Lemma 3.1.2 in [7]. 

Lemma 7. All facets of conv{SQ{S)) different from > 0 for an edge 

{va, Vb) € E can be written in the form a ■ x > 1 with a > 0. 

Thus, if X* conv(SQ{S)), we can find an inequality of the form a • a: > 1, 
a > 0, that separates x* from conv(iS^(S')). Note that if a > 0, there is a Steiner 
tree t G SQ{S) minimizing a ■ t. 

Thus an exact separation algorithm can be stated as follows (the name arises 
from the fact that the algorithm will find a facet of conv(5^(5)), as we will see 
later). 
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FIND-FACET {G = {V, E),R, x*) 

1 T := incidence vector of a Steiner tree for G, R 

2 repeat: 

5 solve LP: min a;* • a, Ta >1, a > 0 (basic solution) 

4 if X* ■ a > 1 : return “x* e conv(5t/(S'))” 

5 find minimum Steiner tree t for G = (V, E,a), R 

6 if t • a < 1 : add t as a new row to matrix T 

7 else: return a ■ x > 1 



The algorithm terminates, since there are only a finite number of Steiner 
trees in ST{S) and as soon as the minimum Steiner tree t computed in Line 
5 is already in T, we terminate because a • t > 1 is an inequality of the linear 
program solved in Line 3. 

Lemma 8. If FIND-FACET does not return an inequality, x* G conv(5^(iS')). 

Proof. Consider the dual of the linear program in Line 3: max^^ Xi,T^X < x* , 
which has the optimal value x*-a > 1. We divide A by x* -a, with the consequence 
that W- Ai = 1. Now, r^A is a convex combination of Steiner trees and it still 
holds T^X<x*. 

Lemma 9. If FIND-FACET returns an inequality a ■ x > 1, this inequality is 
a valid, separating, and facet-defining inequality. 

Proof. The value of the last computed minimum Steiner tree t is t ■ a > 1. 
Therefore, if a: € SQ{S), the value can only be greater and it holds x-a > t-a > 1. 

As a:* • a < 1, the inequality is separating. 

From the basic solution of the linear program, we can extract \E\ linearly 
independent rows that are satisfied with equality. For each such row of the form 
a • t > 1, we add the tree t to a set S\ and for each row Ug > 0, we add the edge e 
to a set Sft. Note that |S'a| + |S';i| = \E\ and the incidence vectors corresponding 
to S\ U Sfj, are linearly independent. 

There is at least one tree tj in S\. For each edge e G 5^ we add to S\ 
a new Steiner graph tk that consists of tj added by the edge e. Since Ue = 0 
we know that a ■ tk = 1- Since the incidence vectors corresponding to S\ U S^ 
were linearly independent, replacing e with the tk yields a new set of linearly 
independent vectors. 

Repeating this procedure yields \E\ linearly independent t^ S S\ with a-ti = 
1. Thus, a • X > 1 is a facet. 

As in [4], we can improve the running time of the algorithm by using the 
following fact. If we know some valid inequalities a ■ x > b with a ■ x* = b 
then X* € conv(50(S')) x* G conv(50(S') n {x G IRI'®! | a • x = 6}). Thus we 
can temporarily remove all edges (vi,Vj) with x^^. = 0, since x).„, > 0 

is a valid inequality. Call the resulting instance S' . We use our algorithm to 
find a facet of conv(5^(S")). We can use sequential lifting to obtain a facet of 
conv(5^(S')). For details see [4] and Theorem 4.2 of [7]. 
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4.3 Directed versus Undirected Formulations 

For computing the lower bounds, we focus on the directed cut formulation, be- 
cause its relaxation is stronger than the undirected variant. However, in the local 
cut separation algorithm we want to solve undirected Steiner graph instances, 
since they can be solved much faster. 

The solution is to use another linear mapping that maps arc-values of a 
bidirected Steiner graph instance S = (U, A, c, R) to edge- values of an undirected 
Steiner graph instance S = (U, E, c', R). 

We define S' by U = {{vi,Vj) \ [vi,Vj] & A} and C(„. = C[y.^y.]. 

For a vector x € we define tp{x) S RI-®! by ijj{x)(^y.^y.) = X[vi,y^] + X[y.^y.]. 

Lemma 10. x* G conv(S^(S)) G conv(S^(S)). 

X G conv(SQ(S)) 3x* G conv(SQ(S)) with ip{x*) = x. 

If c ■ X* is smaller than the cost of an optimal Steiner arhorescence, then 
ipix*) ^ conv(S^(S)). 

For lifting the undirected edges to directed arcs, one can use the computation 
of optimal Steiner arborescences. For the actual implementation, we used a faster 
lifting using a lower bound to the value of an optimal Steiner arborescence, 
provided by the fast algorithm DUAL- ASCENT [17, 25]. For producing facets for 
the directed Steiner problem, one could compute optimal Steiner arborescences 
in the FIND-FACET algorithm of Section 4.2. 

5 Some Experimental Results 

In this section, we present experimental results showing the impact of the meth- 
ods described before. In this paper we confine ourselves to the presentation of 
some highlights, namely the largest benchmark instances ever solved (Table 1). 
Experiments on smaller instances show that vertex splitting can also significantly 
improve the solution time (some additional experimental results are presented 
in [1]). Note that in the TSP context, local cuts were helpful particularly for the 
solution of very large instances. 

We have chosen the approach of applying these techniques together with 
the reduction methods [17], because this is the way they are actually used in 
our program package. Note that without the reductions, the impact of these 
techniques would be even more impressive, but then these instances could not 
be handled in reasonable time. 

All results were obtained with a single-threaded run on a Sunfire 15000 with 
900 MHz SPARC HU- CPUs, using the operating system SunOS 5.9. We used 
the GNU g-l--l- 2.95.3 compiler with the -04 flag and the LP-solver CPLEX 
version 8.0. 
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Table 1. Results on large benchmark instances. In all cases, the lower bound 
reached the value of the integer optimum (and a tree with the same value was 
found) . A dash means that the instance was already solved to optimality without 
local cuts. For the instance dl5112, we used the program package GeoSteiner- 
3.1 [24] to translate the TSPLIB [21] instance into an instance of the Steiner 
problem in networks with rectilinear metric. No benchmark instance of this size 
has been solved before. The SteinLib [22] instances eslOOOO and fnl4461 were 
obtained in the same way. Warme et. al. solved the eslOOOO instance using the 
MSTH-approach [23] and local cuts. They needed months of cpu time. The in- 
stance fnl4461 was the largest previously unsolved geometric instance in Stein- 
Lib. The SteinLib instance lin37 originates from some VLSI-layout problem, is 
not geometric, and was not solved by other authors. Without lower bound im- 
provement techniques, the solution of the instances would take much longer (or 
was not even possible in case of dl5112). The number of vertex splits varied be- 
tween 8 (lin37), 21 (eslOOOO), 173 (fnl4461) and 321 (dl5112). For dl5112 only 
one additional local cut computation was necessary. 



Instance 


Orig 


Size 


Red. 


Red. 


Size 


LPc+fb 


+ vertex splitting 


+ local cuts 




VI 


\R\ 


time 


VI 


VI 


val 


time 


val 


time 


val time 


dl5112 


51886 


15112 


5h 


22666 7465 


1553831.5 


20. 4h 


1553995 


21. 9h 


1553998 21. 9h 


eslOOOO 


27019 


10000 


9S8s 


4061 


1563 


716141953.5 


251s 


716174280 


284s 


— 


fnl4461 


17127 


4461 


995s 


8483 


2682 


182330.8 


5299s 


182361 


6353s 


— 


lin37 


38418 


172 


2Sh 


2529 


106 


99554.5 


1810s 


99560 


1860s 


— 



6 Concluding Remarks 

We presented two theoretically interesting and empirically successful approaches 
for improving lower bounds for the Steiner tree problem: vertex splitting and lo- 
cal cuts. Vertex splitting is a new technique and improves the lower bounds much 
faster than the local cut method, but the local cut method has the potential of 
producing tighter bounds. Vertex splitting, although inspired by a general ap- 
proach (see Section 1), is not directly transferable to other problems, while local 
cuts are a more general paradigm. On the other hand, the application needs 
some effort, e.g., developing proofs for shrinks and implementation using exact 
arithmetic. A crucial point is the development of heuristic shrinks, where a lot of 
intuition comes into play and we believe that there is room for improvement. Al- 
though the local cut method was originally developed for the Traveling Salesman 
Problem, its application is much clearer for the Steiner tree problem. 

Both methods are particularly successful if there are some local deficiencies 
in the linear programming solution. On constructed pathological instances the 
lower bounds are still improved significantly, but the progress is not fast enough 
to solve such instances efficiently. 

Another interesting observation is that the power of the vertex splitting ap- 
proach can be improved by looking at multiple roots simultaneously. In fact, we 
do not know any instance where repeated vertex splittings would not bring the 
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lower bound to the integer optimum if multiple roots are used. It remains an 
open problem to find out if this is always the case. 
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Abstract. In this work we study the important problem of colouring 
squares of planar graphs (SQPG) . We design and implement two new al- 
gorithms that colour in a different way SQPG. We call these algorithms 
MDsatur and RC. We have also implemented and experimentally eval- 
uated the performance of most of the known approximation colouring 
algorithms for SQPG [14, 6, 4, 10]. We compare the quality of the colour- 
ings achieved by these algorithms, with the colourings obtained by our 
algorithms and with the results obtained from two well-known greedy 
colouring heuristics. The heuristics are mainly used for comparison rea- 
sons and unexpectedly give very good results. Our algorithm MDsatur 
outperforms the known algorithms as shown by the extensive experi- 
ments we have carried out. 

The planar graph instances whose squares are used in our experiments 
are “non-extremal” graphs obtained by LEDA and hard colourable graph 
instances that we construct. 

The most interesting conclusions of our experimental study are: 

1) all colouring algorithms considered here have almost optimal perfor- 
mance on the squares of “non-extremaF planar graphs. 2) all known 
colouring algorithms especially designed for colouring SQPG, give sig- 
nificantly better results, even on hard to colour graphs, when the vertices 
of the input graph are randomly named. On the other hand, the perfor- 
mance of our algorithm, MDsatur, becomes worse in this case, however it 
still has the best performance compared to the others. MDsatur colours 
the tested graphs with 1.1 OPT colours in most of the cases, even on 
hard instances, where OPT denotes the number of colours in an opti- 
mal colouring. 3) we construct worst case instances for the algorithm of 
Fotakis el al.[6], which show that its theoretical analysis is tight. 



1 Introduction 

Communication in wireless systems, such as radio networks and ad-hoc mobile 
networks, is accomplished through exploitation of a limited range of frequency 

* This work has been partially supported by the EU IST/FET projects ALGOM-FT 
and GRESGGO. 



K. Jansen et al. (Eds.): WEA 2003, LNCS 2647, pp. 15-32, 2003. 
(c) Springer-Verlag Berlin Heidelberg 2003 



16 



Maria I. Andreou et al. 



spectrum. One of the mechanisms utilised is to reuse frequencies where this does 
not result to unacceptable levels of signal interference. In graph theoretic terms, 
the interference between transmitters is usually modelled by the interference 
graph G = (V, E) , where V corresponds to the set of transmitters and E rep- 
resents distance constrains (e.g. if two adjacent vertices in G get the same or 
nearby frequencies, then this causes unacceptable levels of interference). 

In most real systems, the network topology has some special properties, e.g. G 
is a lattice network or G is a planar graph. Planar graphs are the object of study 
in this work, both because of their importance in real networks and also because 
of their independent combinatorial interest. 

The Frequency Assignment Problem (FAP) is usually modelled by variations 
of the vertex graph colouring problem. The set of colours represents the available 
frequencies. In addition, in an acceptable assignment the colour of each vertex 
of G gets an integer value which has to satisfy certain inequalities compared to 
the colours of its nearby vertices in G (frequency-distance constraints). The FAP 
problem has been considered e.g. in [7, 8, 11]. 

Definition 1. (fc-colouring) Given a graph G = (P, E) let F : V ^ {1, ..., oo} be 
a function, called fc-colouring of G, such that for each u, ?; € P it is |F„ — Fy \ > x 
for X = 0, 1, ..., k if D{u, v) < k — x + 1. D{u, v) denotes the distance between u 
and V in G, i.e. the length of the shortest path joining them. The number of 
colours used by F is called order and is denoted by A = |F"(P)|. The range of 
them is denoted by s = maXy^vF{v) — minuevF{u) + 1 and is called span. 

Definition 2. (Radiocolouring, RCP) When k = 2 then the function F is called 
a radiocolouring of G of order A and span s. The radiochromatic number of G 
is the least order for which there exists a radiocolouring of G and is denoted 
by Xorder{G). The least span is denoted as Xspan{G), respectively. 

A variation of RCP useful in practice is the following: 

Definition 3. The min order min span radiocolouring on a graph G is a function 
that radiocolours G with the least number of distinct colours while the range of 
colours used is the least possible. 

It can be easily seen, that the radiochromatic number of any graph G (Xorder(G)) 
is equal to the chromatic number of its square, i.e. x{Fr^) [8]- 

Definition 4. The square of a graph G is denoted by G^ and is a graph on the 
same vertex set as G with an edge between any pair of vertices of distance at 
most two in G. The graph G is a square root of graph G^ . 

In this work we experimentally study the colouring problem on the square 
of any planar graph (i.e. the min order RCP) and the min order min span RCP 
on planar graphs. Ramanathan and Loyd proved that the first one of the above 
problems is NP-complete [15]. Fotakis et al. proved the NP-completeness of RCP 
on planar graphs [6]. Thus, efficient solutions for these problems necessarily rely 
on the choice of heuristics and approximation algorithms. To our knowledge, for 
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the colouring of SQPG, there is an 1.66-approximation algorithm due to Molloy 
and Salavatipour [14] that currently gets the best approximation ratio. We are 
aware of another three colouring approximation algorithms designed for these 
graphs [10, 6, 4]. The first two of them have approximation ratio 2 and the third 
one has ratio 1.8. 

Towards a global picture of the problem our work studies a variety of al- 
gorithms that have distinct methodologies. In this direction we have first im- 
plemented a new algorithm that we propose for colouring SQPG. We call this 
algorithm MDsatur, because it is a non-trivial modification of the well-known 
colouring algorithm Dsatur of Brelaz [5]. In our opinion, our approach is novel, 
because all known algorithms for this problem colour the uncoloured vertices 
of a given graph mainly based on the number of neighbours of each of these 
vertices either in graph G, or in the subgraph of graph that has already 
been coloured. On the other hand, our algorithm colours the uncoloured vertices 
mainly based on the number of distinct colours that have already been assigned 
to their neighbours. 

We have also implemented the algorithm which is presented and theoretically 
analysed in the work of Fotakis et al. [6] and we call it FNPS. We also considered 
the algorithm of Agnarsson and Hallddrsson presented in [4] and we call it AH. 
This algorithm actually concerns planar graphs of large maximum degree A > 
749 only. It is proved in [4] that the upper bound achieved by AH is tight. 

To make our experimental study on colouring SQPG more complete we have 
also implemented two well known greedy colouring heuristics, which colour any 
graph. The first one is the Randomised First Fit (RFF). The second, which we 
call it MaxIScover, covers the given graph by maximal independent sets. 

Finally, we also propose and implement a new min order min span radio- 
colouring algorithm, which we call RadioColouring (RC). RC min order min 
span radiocolours any planar graph G given a colouring on G^. Based on its 
results we interestingly conjecture that Xspan{G) — Xorder{G) is bounded from 
above by a small constant. So any efficient algorithm that solves the min order 
min span problem on planar graphs can be used as an efficient algorithm for the 
colouring problem on SQPG and vice-versa. 

We use three sets of planar graph instances to evaluate the performance of 
the above algorithms. The first one. S'!, has graphs which are obtained from 
the graphs generation library of LEDA [12]. These graphs are “non-extremaf , 
because of the randomness in their generation. In the second set, S2, we create 
graphs that are modifications of the graph instances of the first set. We ex- 
pected that the graphs of set S2 would be harder to colour graph instances for 
the colouring algorithms studied. The last set, S'S, has hard colourable graph 
instances at least for our algorithm MDsatur and for algorithm AH. 

From the experimental results we observe that each one of the colouring 
algorithms considered in this work colours the square of any planar graphs from 
set S'! with Z\ -|- c colours, where c is a small constant (i.e. 4). It can be easily 
seen, that Z\-|-l colours are necessary. The results of these algorithms are similar 
and on the graphs of set S2 as well. On the other hand, the results on the 
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harder to colour graph instances of set S3 demonstrate the differences on the 
performance of each one of the algorithms considered. In particular, we remark 
that MDsatur has the best experimental performance (EP) in all the cases. We 
measure the EP as the ratio of the number of colours the algorithm uses over 
the lower bound value on the chromatic number of the square of any graph G, 
that is A{G) + 1. Our algorithm has EP less than 1.1 in most of the cases, 
even on hard to colour instances, while its theoretically claimed approximation 
ratio is 1.5 [2]. Heuristic Maxi S cover has the second best EP. Heuristic REF 
also gives very good results. There are no theoretical results for these heuristics. 
Thus, we find their experimental results very interesting. The other algorithms 
give more or less expected results, much better than their theoretical upper 
bounds. For example, FNPS has EP ranging from 1.1 to 1.4. However, we 
present in Section 3 a planar graph G and an order of its vertices such that 
if algorithm FNPS selects the vertices according to this order, then it needs 
2 A colours to colour G^. This proves that the theoretical upper bound of this 
algorithm is tight. 

All our implementations are written in the C programming language. The 
programmes run on a Sun Enterprise Solaris 2.6 system with 4 processors at 
300MHz and 256 MB RAM. The source codes and the planar graph instances 
used are available at: http://students.ceid. upatras.gr/'mandreou/. 



2 Description of the Algorithms 

The Algorithm MDsatur. We here present our algorithm MDsatur. As we 
have said in the introduction, this algorithm is based on a novel approach for the 
colouring problem on SQPG and is inspired by algorithm Dsatur of Brelaz [5]. 
Dsatur colours very well and optimally graphs whose chromatic number is 
strongly related to their clique number, i.e. random graphs and fc-colourable 
random graphs [17, 16]. The chromatic number of SQPG is also strongly re- 
lated to their clique number, because x(C^) < 1.66Z\(G) -I- 24 ([14]). Thus, 
x(G^) < 1.66a;(G^) -I- 24 (where G is a planar graph and w(G^) is the size of the 
maximum clique in G^). 

Algorithm Dsatur greedily colours a given graph G. At each point it colours 
a vertex that has the maximum value on its degree of saturation with the smallest 
allowable colour. The degree of saturation, Ds, of a vertex is the number of 
distinct colours already assigned to its neighbours. 

Algorithm MDsatur colours the vertices of a given graph G^ based on a dif- 
ferent to that of Dsatur order, O, which is augmented by vertices that satisfy 
more restrictions. More precisely, let i be the current point of MDsatur in its 
application on G^. Then, the current uncoloured vertex which manages to be- 
come the vertex in position i of O, let it be v, satisfies the following requirements 
in turn: Rl) has the maximum value on its degree of saturation in the current 
point of this algorithm (as Dsatur), R2) has the oldest coloured neighbour (i.e. 
sits in the leftmost position in O among the oldest neighbours of all the un- 



Algorithms and Experiments on Colouring Squares of Planar Graphs 



19 



coloured vertices) and R3) is the ‘closer’ neighbour of vertex Oi-i (namely the 
last vertex coloured). We below provide a pseudo-code description of MDsatur. 



Algorithm: MDsatur 

Input: a graph G = (V,E). Output: a colouring of G. 

Begin 

1. (a) Set each vertex of G as uncoloured, and 

set a dynamic order on them, called O, as empty. 

(b) Select any vertex of G. 

Add it in the beginning of O (in the position 1 of O), 
and colour it with colour 1. 

2. For i = 2 to n = |F| do 

(a) Set X to be the set of the current uncoloured vertices 
which have the maximum value on their degree of saturation 
(Requirement Rl). 

(b) For each vertex u G A do 

Find the first neighbour of u coloured 

(i.e. its oldest coloured neighbour) (Requirement R2). 

(c) Select from X the vertex v which has the oldest coloured neighbour, 
denoted by ON, and also has the maximum number of common 
neighbours with the last coloured vertex (namely Oi_i) 
(Requirement R3). 

If more than one of the vertices of X satisfy these requirements, 
then select at random one of them with equal probability. 

If none of the uncoloured neighbours of ON, which are in X, has 
common 

neighbours with vertex Oi-i, then select one of them at random. 
Set Oi = V. 

(d) Colours; with the smallest of its allowable colours, i.e. are not assigned 
to its neighbours. 

If none of the colours used is allowable for this vertex, insert a new 
colour. 

3. Return the colour of each vertex. 

End 



The most ^closer’ neighbour of a vertex u is this one whose colouring is 
determined at most by the colours of the other neighbours of u. In most cases 
this neighbour of u, lies near to it in a planar embedding of G. Note that we 
achieve to well-approximate the nearby vertices in a planar embedding of G, 
without actually finding any such embedding. The effect of this is that our 
algorithm can colour any graph, so its results are more interesting. 

At this point, let us just provide an intuitive explanation of why this algo- 
rithm gives better results than the original algorithm Dsatur on SQPG. The 
main reason is that MDsatur colours the given graph more locally based on 
the last coloured vertex. Also, the subgraph of the given graph that has already 
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(a) 



(b) 



Fig. 1. The graphs that are used in order to explain why MDsatur colours 
a given graph more locally and compact than Dsatur 

been coloured at any point of the application of MDsatur on this graph is more 
compact (dense) than the corresponding subgraph obtained from Dsatur. We 
explain how these properties of our algorithm are achieved using an example 
and we mention their consequences on the resulting colouring. 

Let G be a planar graph and suppose that MDsatur is applied on G^. 
Also suppose that in an intermediate point of this application the following 
hold: Let {Gl, G2, G3, G4, ...} be a sequence of partially coloured cliques and 
{ul,u2,u3,u4, ...}, be a sequence of uncoloured vertices. Each vertex uj {j = 
{1,2,3, ...}), belongs in both cliques Cj and G(j + 1) (see Fig. la). We assume 
that the values of the degree of saturation of all these vertices are the same. Then 
it would be possible to colour all these vertices sequentially before the colouring 
of other vertices which belong to the above cliques. 

This fact, quite possibly yields an increase on the number of colours used, 
because it is possible to have vertices (like vertices x,y,z in the figure), such 
that their degree of saturation, when they will be coloured, is mainly computed 
based on the colours of non-adjacent vertices. 

To avoid the above situation, we focus our effort on the guidance of algorithm 
Dsatur, so that to colour as compact as possibly a given graph. In this direction 
we add requirement R2 in our algorithm, M Dsatur. This has to be satisfied by 
the next vertex to be coloured, in each point of this algorithm. The behaviour 
of our algorithm on the above example is explained in the following. Let as call 
ONui {ONu 2 ) the oldest coloured neighbour of vertex ul (u2). If ul is the next 
vertex to be coloured, then this means that vertex ON^i was coloured before 
vertex ONu 2 (due to req. R2). If ON^i € GI (observe that ul' S Gl), then 
according to this algorithm the next vertex to be coloured cannot be vertex u2, 
because the oldest neighbour of ul' is at least as old coloured as vertex ON^i. 
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Thus the next vertex to be coloured is vertex ul' , which extends only the same 
clique as the predecessor vertex coloured (namely m1). So, the first of our goals 
is achieved on the colouring obtained by MDsatur. However, if ON^i ^ Cl, 
then unfortunately it is possible to colour vertex u2 before vertex uV . 

Finally, requirement R3 is used in order to achieve to colour in turn nearby 
vertices (in an embedding of graph G) of the same clique. Namely we try to 
locally colour these vertices. As we have said before, the most ‘closer’ neighbour 
of the last vertex coloured lies near to its location in a planar embedding of 
graph G. Fig. lb is representative of this case. Let u be the next vertex to be 
coloured. Observe that the colouring of this vertex influences the value of the 
degree of saturation of each other vertex in the square of this graph. Suppose 
that all these vertices have the same value on their degree of saturation. By 
requirement R3 our algorithm colours after vertex u a vertex from set g\ because 
only vertices of this set (i.e. neighbours of u in the square of this graph) have 
the maximum number of common neighbours with the last vertex coloured (u). 

Lemma 1. (Connectivity Property) Suppose that algorithm MDsatur is 
applied on a connected graph G. If S is the subgraph of G that has already 
been coloured at any point of MDsatur during its application on G, then S is 
a connected graph. 

Proof. See full version [1]. □ 

Remark 1. Among the algorithms designed for colouring SQPG, [4, 6, 14, 2], 
the only one which has the Connectivity Property is algorithm MDsatur. This 
property of MDsatur mainly differentiates it from the others and is one of the 
keypoints where the analysis of its performance is based on [2] . 



The FNPS Algorithm. FNPS was first presented in [6]. It is a 2-approxi- 
mation algorithm on SQPG. 

Lemma 2. [10] Let G be a planar graph. Then there exists a vertex u in G 

with k neighbours, let them U 2 , ..., Vk, with d{vi) < d{v 2 ) < ■.., d{vk) such 
that one of the following is true: 

(i) fc < 2; (ii) k = 3 with d{vi) < 11; (iii) A: = 4 with d{vi) < 7 and d(v 2 ) < H; 
(iii) k = 5 with d{vi) < 6, d{v 2 ) < 7 and d{v 3 ) < 11, where d{v) is the degree of 
vertex v. 

Definition 5. Let G be a graph and e = uvhe one of its edges. Set as G' = G/e 
the graph that is produced from G be deleting the vertex u from it and joining 
each one its neighbours to the vertex v. 
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FNPS Algorithm 

Input: a planar graph G and its square G^. Output: a colouring of G^ . 
Begin 

1. Sort the vertices of G by their degree. 

2. If Z\ < 12 then follows Procedure 1 below: 

Procedure 1: Every planar graph G has at least one vertex of degree 
< 5. Now, inductively assume that any proper (in vertices) subgraph 
of G^ can be coloured by 66 colours. Consider a vertex v \n G with 
degreeiv) < 5. Delete v from G to get G' . Now recursively colour G'^ 
with 66 colours. The number of colours that v has to avoid is at most 
5Z\ + 5. Thus, there is one free colour for v. 

3. If Z\ > 13 then 

(a) Find a vertex v and a neighbour vi of it, as described in Lemma 2, 
and set e = vvi. 

(b) Form G' = G/e and modify the sorted list of vertices according to 
their new degrees. 

(c) F{G') = FNPS{G' , G that is the colouring which is produced by 
the recursive call of this function on G' and G^. 

(d) Extend F{G') to a valid colouring of G^ as follows: 

Colour V with one of the proper colours used in the colouring F of G 
If no colour used is allowable for vertex v insert a new one. 

End 



The Algorithm AH. We note that algorithm AFl colours the square of a pla- 
nar graph G if A{G) > 749. 

AH Algorithm 

Input: a graph G^ and the maximum degree of G, A. 

Output: a colouring of G^. 

Begin 

1. inductiveness = \. S', 

2. While there are uncoloured vertices do 

(a) find any uncoloured vertex with degree less than inductiveness * A 
in the current subgraph of G^ that has already been coloured i.e. 
vertex v 

(b) colour V with the smallest allowable colour 

(c) delete v from G 

3. Return the colour of each vertex. 

End 



The Heuristic REF. Let G be a graph and O a random order on its vertices. 
RFF assigns to each vertex the first colour that is not assigned to any one of 
its previous neighbours (i.e. are before the vertex v in order O). 
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The Heuristic MaxIScover. This heuristic finds a cover of the given graph G 
by maximal independent sets (MISs). Namely, it uses an algorithm that finds 
a MIS in G. Then, it colours the vertices of this set with a new colour and it 
deletes them from the current graph G. This process is done recursively on until 
the graph G becomes empty. Here we use heuristic Max IS [3] to find a maximal 
independent set of the current graph G. It is clear that instead of this algorithm 
any algorithm which finds MISs in SQPG can be used. 



The RadioColouring Algorithm. We propose an algorithm that finds a min 
order min span radiocolouring. Since our experiments seem to suggest that in 
planar graphs the difference between Xspan{G) and Xorder{G) is a small constant 
(e.g. 4), the algorithm below can be also used to evaluate the chromatic number 
of the square of a planar graph. 

Algorithm RadioColouring (RC) 

Input : a graph G and a colouring of its square, let it be Col{G^) 

Output : a min order min span radiocolouring of G, let it be rad{G). 

Begin 

1. Initialise the radiocolouring of G to be the colouring of its square (i.e. 
rad{G) := Gol{G^)). 

2. Sort the vertices of G in decreasing order by their degrees. Let O be the 
resulting permutation. 

3. Take into account the vertices of G based on their rank in O. Radiocolour 
each one of them which has a wrong radiocolour, let it be v, as follows: 
rad[v] := smallest proper radiocolour, based on the radiocolours of the 
neighbours of?; in G^, which have bigger degree than its degree (i.e. are 
to the left of vertex v in O). 

4. Return the radiocolour of each vertex. 

End 



3 Hard Instances 

In this section we present the planar graphs whose squares are hard to colour 
by most of the algorithms considered. Let G be any such graph. Agnarsson and 
Hallddrsson in [4] characterise the graphs G^ where their algorithm’s perfor- 
mance bounds are tight. These graphs are worst case graph instances for AH . 
Andreou and Spirakis in [2] claim which the hard graph instance for our algo- 
rithm MDsatur are. Here we identify a worst case instance for algorithm FNPS 
and based on this we prove that its theoretical analysis is tight. 

Based on Agnarsson et al, Andreou et al. [2, 4] we present here the potential 
structures of G. We also present an intuitive explanation of why the correspond- 
ing graphs G^ are hard to colour graphs. 
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Fig. 2. a) an example of a 5-regular planar graph, the icosahedron, b) the 
neighbours of vertex v of icosahedron in the hard to colour graph obtained from 
the icosahedron when k\ = l,/c2 = 2,fc3 = 3,/c4 = 4, fc5 = 5. c) a planar 
square root of a clique when there is a cycle of size six in its exterior in a planar 
embedding, d) a square root of a clique of size more than Z\-|-3. e) the graph that 
leads to the proof that the theoretical analysis on the performance of algorithm 
FNPS is tight 

3.1 Hard Graph Instances for Algorithms AH and MDsatur 

Let G' be a planar r-regular graph. The graph G is obtained from G' by replacing 
each of its edges, i.e. e, with a set of vertices and joining them with the endpoints 
of the corresponding edge. We remark that r < 5 in the case of planar graphs [1]. 
An example of a 5-regular planar graph is shown in Fig. 2a. 

Observe that each vertex in G^ may belong in two cliques of maximum size. 
This happens when the degree of each vertex of G' becomes L\(G). 



About Algorithm AH. If G' is a 5-regular planar graph, then the graph G 
obtained from this graph as described above is one of the graphs, which makes 
the performance of algorithm AH tight. This is true, because if all the edges of 
graph G' are replaced by sets of vertices of the same size, i.e. A/5, then each 
vertex has 1.8 A neighbours in graph G^. We recall that at each point of the 
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application of this algorithm on it chooses a vertex of degree at most 1.8Z\ 
in the current subgraph of that has already been coloured and it colours it 
with the first allowable colour. Let be a vertex of G and H be the subgraph 
of G on vertex v and its neighbours (see Fig. 2a). Observe that is a clique 
in G^ of size A + 1. Thus, if all neighbours of each vertex oi H m G^ — H have 
already been coloured using a distinct colours from {1, 2, 3, ..., 0.8Z\}, then AH 
has to assign a distinct new colour to each of them. Thus, AH uses 1.8Z\ colours 
to colour G^. 



About Algorithm MDsatur. The graph G that is obtained from a planar r- 
regular graph G' as described above also has a square that is a hard to colour 
graph by our algorithm MDsatur. Andreou and Spirakis claim this in [2]. The 
basic tools used in that work is the Connectivity Property of MDsatur and the 
fact that no planar graph has as subgraph the graph [9]. Here we briefly 
present the basic steps of the proof of the assertion following [2]. Let MDsatur 
apply on a graphs G^. 

1. Let H^ be a currently uncoloured subgraph of G^, then MDsatur will colour 
its vertices with new colours iff the coloured neighbours of each of them have 
all the colours currently used. For example if the graph H is the subgraph 
of G on the vertex v with its neighbour (like in Fig. 2b), then in order to 
colour graph H^ with as many new colours as possible, it is implied that the 
vertices of graph G^ — H^ have to be already coloured. In [2] it is proved 
that at most half of the vertices of H can have neighbours coloured with A 
distinct colours before their colouring. 

2. In that work it is also proved that the structure of graph H that leads to 
significantly hard to colour instances for MDsatur is when H is the star 
graph (observe that in this case H^ is a clique). Otherwise, if H^ is a clique 
and H is not a star graph, then some of its vertices lie inside a cycle C (in 
an embedding of H in the plane) and by the planarity of G it is impossible 
to have neighbours vertices of G — H that lie outside cycle C (see Fig. 2c). 
Thus, because of the connectivity property of MDsatur it is proved in [2] 
that the graphs obtained under these conditions are not harder to colour, 
than the graphs obtained when iL is a star graph. It is also proved there, that 
the insertion of new vertices between any pair of vertices similar to (wl, w 2 ) 
in Fig. 2b does not affect the performance of this algorithm, even if these 
new vertices are A. This is correct, because by the connectivity property it 
is impossible to colour the new vertices after the colouring of at least six 
vertices of H. This is true, because each group of new vertices (one for each 
pair of vertices (wl,r(;2)) are lie inside distinct bounded faces. Hence, their 
colouring affects only the degree of saturation of vertices of H. 

3. If iL is the star graph, it is also proved in [2], that each set of at least four 
vertices of H cannot belong to more than two cliques of size at least eight 
with at least three distinct vertices each in G^ (because of K 3 3 ). Based on 
this it is proved in [2] that when iL is a star graph, G^ is very hard to colour 
by MDsatur. By the argument used at the end of the above step we conclude 
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that even if vertices of H belong to more than two cliques of maximum size 
in this does not affect the performance of our algorithm. 

4. Finally, if the graph is a clique of size greater than Z\ + 3, then H has the 
structure of the graph shown in Fig. 2d as it is claimed in [2]. So again by the 
connectivity property it is proved in that work that the graph obtained 
under this condition is not hard to be coloured graph for MDsatur. 

3.2 Worst Case Instance for Algorithm FNPS 

In this subsection we prove that the graph G properly obtained from the graph 
presented in Fig. 2e shows that the theoretical analysis of FNPS is tight. 

Lemma 3. The approximation ratio of algorithm FNPS is tight. 

Proof. Let G the graph obtained by the graph shown in Fig. 2e as follows: 
replace each supervertex of set 2 hy A — 1 new vertices and joining each of the 
new vertices with the neighbour of the corresponding supervertex. To prove this 
Lemma it is enough to show that FNPS colours vertex x with colour 2Z\. 

Suppose that algorithm FNPS firstly selects the vertex v (see in the pseudo- 
code step 3) to be a vertex from set 1. Observe that the degree of vertex v in this 
case is two in the current graph G. Hence it is not needed to find a vertex that 
will satisfy the conditions of vertex nl (see also the pseudo-code). Subsequently, 
FNPS sequentially selects all the vertices of this set and all the vertices from 
set 2 and then it deletes them from G. Observe that all these vertices had degree 
two in G, so their selection was allowable. Subsequently FNPS can choose 
sequentially all the vertices of set 3 and all the vertices of set 4- This is an 
allowable action, because the vertices of set 2 have already been deleted from G. 
So the vertices of set 3 had degree one at their selection time. Finally we assume 
that FNPS selects in turn the vertices u, w and x. 

FNPS is a recursive algorithm. Let G' be the current graph G in iteration i 
and let v be the vertex that is selected by FNPS at this iteration. Then, from 
the way that FNPS colours the vertices of G we conclude that it colours vertex v 
after the colouring of the graph G — {i;}. Thus, FNPS colours the vertices in 
the order that they are deleted from G. Hence, the vertices of set 1 are first 
coloured by colour 1. The vertices of set 2 which have replaced the same super 
vertex of the graph of Fig. 2e have the colours {1,2,...,Z\ — 1}. To the vertices of 
set 3 are assigned the colours {2, ..., A — 1}. Hence, the vertices of set 4 have the 
colours {A, A + 1, A + 2, ..., 2A — 2}. Finally vertex u receives colour 1, vertex w 
receives colour 2Z\ — 1 and vertex x the colour 2Z\. □ 

4 Graph Generation 

The planar graph instances, which give the squares used as input to the colouring 
algorithms, that are experimentally evaluated in this work, are split in three sets. 
In the first one, we call it SI, are graphs created using the graphs generation 
library of LEDA [12]. More specific, we used the following LEDA procedures: 
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Class Cl: random_planar_graph(G, n, m = 1.5*n); which produces a random 
maximal planar graph G with n — 1 vertices. Then it adds a new vertex into 
a random selected face F and it joins this vertex with all the vertices of face F. 
An appropriate number of edges of graph G are deleted randomly in order to 
have only m edges, in the resulting graph. We mention here that m < 3n — 6 in 
the case of planar graphs [9]. The graphs of Class C3 are maximal planar graphs 
and they have almost the maximum number of allowable edges. So, in this case 
we prefer an intermediate value on m and we choose m = 1.5n 

Class C2: triangulated_planar_graph(G, n); which produces planar graphs 
without any cycle of size at least four. 

Class C3: maximal_planar_graph(G, n); which produces a maximal planar 
graph with n vertices. 

The planar graphs produced by the procedures of both classes C2 and C3 
are initially generated the procedure of class Cl, i.e. are random-planar graphs, 
and then these procedures appropriately add edges to these graphs to become 
triangulated planar and maximal planar, respectively. 

In the second set, we call it S2, we take each of the graphs obtained from 
LEDA in classes Cl, C2, C3 and we create a new graph modifying the original 
graph in order to get harder to colour graph instances for the considered algo- 
rithms. Their hardness has the characteristics of the hard graph instances which 
are discussed in Section 3. Hence, if G is a planar graph from set SI and k is 
a positive integer, then the planar graph G' obtained from G by replacing each 
one of its edges, let it be e, with a set of new vertices of size k and joining each 
one of the new vertices with the endpoints of edge e. 

In the last set, S3, we create graphs obtained from the icosahedron (20dron) 
(see Fig. 2(a)) by replacing each of its edges with a set of new vertices. The size 
of these sets has one of the following values: K = {fcl, fc2, ..., fc5}. We remark 
here that the icosahedron is a 5-regular planar graph with 30 edges. We also 
mention that the edges incident to each of its vertices have to be replaced by 
sets of vertices for each of the sizes kl to k5. With this condition we achieve the 
degree of each vertex of the icosahedron to become A in the resulting graph (so 
each vertex of G belongs in two cliques of maximum size in G^. This is a hard 
case as it is explained in Section 3). 

To specify the size of the set of vertices which replaces each edge of the 20dron 
we implement an exhaustive algorithm which edge-colours the 20dron with five 
colours and returns all possible edge colourings of this graph [1]. This algorithm 
returns 780 different edge colourings. Having an edge colouring we properly 
replace each edge of colour i with a set of new vertices of size ki (1 < i < 5). 

The size n of the planar graph instances of set SI is 1024. In most past 
experimental works the size of the input graphs does not exceed 1024 [13]. We 
have done experiments on smaller (e.g. n = 256) and bigger graphs (e.g. n = 2048) 
and we observed that the asymptotic behaviours of the algorithms considered 
remain the same. So, in this sense, our experiments can be considered large 
enough and representative. 
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For graphs of set S2 we performed tests on the squares of the graphs obtained 
from the graphs of set SI with size n = 256 and fc = 6. In most cases the value 
of n', that is the number of vertices of the resulting graph, is 3n * 6 > 4000. As 
shown in Tables 1 and 2 the graphs of set S2, which are expected to be harder 
to colour instances than the corresponding graphs of set SI, do not affect the 
behaviour of the algorithms w.r.t. their behaviour on the squares of the graphs 
of set SI. Thus, we conclude that our experiments are representative. 

We also performed some special experiments on the graphs of set S3. If 
fcl = fc2 = fc3 = fc4 = fc5, then we denote by k this unique value. We produce 
graphs with k = 20 and k = 100. 20dron has only 30 edges, so it can be handled 
even with such a large value on k. From our results we conclude that the number 
of colours used by each algorithm considered is proportional to the increase of 
the value of k, while their performance remains the same. Thus, the results on 
these values of k are representative. 

In the case where the value of each fci (1 < i < 5) is distinct, we have done 
experiments for the case where kl = x,k2 = 2x, kO = 3a;, kA = Ax, kb = bx, 
for a; = 5, 15. These results have greater interest on our algorithm, MDsatur in 
contrast to the results of the previous case, which are more interesting for the 
rest of the algorithm and especially for FNPS. 

5 Experimental Results 

We split this section in two parts. In the first one, we present the experimental 
results of the colouring algorithms RFF, MaxIScover, MDsatur and FNPS, 
which colour squares of planar graphs. In the second part, we present the result 
of the radiocolouring algorithm RC, when it has as input a planar graph G 
and a colouring of obtained from each one of the above algorithms. In the 
following tables are depicted: a) the average number of colours used from each of 
the above algorithms on the squares of 10 planar graphs from each of the classes 
Cl, C2, C3. b) the average maximum degrees of the corresponding planar graphs. 
We need this information in order to be able to make some comparisons between 
a good estimation of the optimal colouring (of each of these graphs, i.e. zi + 1) 
and the solutions produced from each one of the above algorithms, c) the value 
of k or the value of x depends on whether ki’s have the same or different values 
(1 < i < 5). 

5.1 Colouring the Square of a Planar Graph 

The results of the application of the colouring algorithms, which are considered 
here, on SQPG are depicted in Tables 1-3. In Table 1 are shown the results on 
the squares of the graphs of set SI, in Table 2 the results on the squares of the 
graphs of set S2. In Table 3 we considered the squares of the graphs of set S3 
and in Table 4 are shown the ER of each of the algorithm considered on these 
graphs. The most important observation on the results depicted in Table 1 is 
that each one of the considered algorithms uses A + constant colours to colour 
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planar graph class 


max degree 


Greedy 


FindMIS 


MDsatur 


FNPS 


random_planar_graph 


60,4 


63 


61 


60,4 


61 


triangulated_planar_graph 


24,1 


26 


28 


24,5 


25 


maximal_planar_graph 


120,1 


121 


120,5 


120,5 


121 
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planar graph class 


max degree 


Greedy 


FindMIS 


MDsatur 


FNPS 


random_planar_graph 


188,5 


188,5 


188,5 


188,5 


189 


triangulated_planar_graph 


99,4 


102 


101,2 


99,8 


100,5 


maximal_planar_graph 


295 


295,5 


295 


295 


296 
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max 


Greedy 


FindMIS 


MDsatur 


FNPS 




degree 


min % max% 


min % max % 


min % max% 


min%> max%> 


k=20 


100 


115 ~ 100% 


105- 115 - 100% 


102,104 90% 12010% 


135- 140-100% 


k=100 


500 


540-560- 100% 


510-550- 100% 


502-505-90% 600-10% 


690-705 -100% 


X 

II 


75 


85-96 80% 100-20% 


80-85 -100% 


77-82 50% 90-94 0.06% 


101 -100% 


X =15 


225 


247 - 260- 100% 


240-260 -100% 


227-247 50% 270-2810.06% 


300 -100% 
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max 


Greedy 


FindMIS 


MDsatur 


FNPS 




deg. 


min % max%o 


min % max % 


min % max%> 


min%o max%> 


k=20 


100 


1.15 - 100% 


1.05-1.15 - 100% 


1.02-1.04 90% 1.20-10% 


1.35- 1.40-100% 


k=100 


500 


1.1 - 100% 


1.02-1.10- 100% 


1.004-1.01-90% 1.20-10% 


1.38-1.41 -100% 


X 

II 

Grt 


75 


1.13-1.28 80%1.3 20% 


1.06,1. 13%-100% 


1.01-1.09 50%1.2-1.25 0.06% 


1.35 - 100% 


X =15 


225 


1.09- 1.15 - 100% 


1.08-1.15 - 100% 


1.02-1.150% 1.2-1.25 0.06% 


1.33 - 100% 



Tablel; 4 



Fig. 3. Table 1: In this table are depicted the results, on average, of the 
colouring algorithms: RFF, MaxIScover, MDsatur and FNPS. on 10 planar 
graphs from set SI with 1024 vertices each. In Table 2 are shown the results, on 
average, of the above colouring algorithms on 10 planar graphs from set S2 with 
256 vertices each and /c = 6. In Table 3 are depicted the results, on average, of 
these algorithms on the icosahedron when k = 20, 100 and x = 5, 15. Finally in 
Table 4 are displayed the experimental ratios (ER) of each of the above algorithm 
on the results depicted in Table 3 



the square of any planar graph which belongs in set SI. Actually this constant 
term is very small, i.e. 4. This very good result yields the first indication that 
it is maybe possible to get colouring algorithms that colour the square of each 
planar graph with less than 1.66 A colours (that currently is the best known 
result). In order to further support this assumption we get results on harder to 
colour squares of planar graphs. I.e. the squares of the graphs of sets S2, S3. 

In Table 2 are depicted the results on the squares of the graphs of set S2. From 
these results we conclude that the colouring algorithms we consider still colour 
very well the squares of the graph instances of set S2. There is no significant 
difference in the number of colours used from each one of the algorithms. 
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As we mention in the introduction the graphs obtained from LEDA are ex- 
pected to be “non-extremal” graphs, because of the randomness in the way they 
are constructed (see Section 4). From the results we conclude that there does 
not exist a clique of size greater than Z\ -|- 3 in these graphs, because all the 
algorithms considered colour them with A + constant colours. Because of this 
fact the results obtained on these graphs are reasonable (EP about 1), because 
as we can see from Tables 3, 4 almost all the algorithms considered have very 
good EP even on hard colourable graph instances. 

The last experiments have been done on the squares of graphs obtained from 
an 20dron. I.e. the squares of the graphs of set S3. Our algorithm MDsatur 
in the case where k = 20 uses 1.02Z\ to 1.04Z\ colours to colour the 90% from 
the 780 graphs tested. We recall here that we create one graph for each edge 
colouring of 20dron. In only 10% of these squares it uses 1.20Z\ colours. When 
k = 100 the behaviour of our algorithm is improved. It uses at most 1.004A to 
1.01 A colours to colour 90% of the graphs tested. This is reasonable, because 
MDsatur actually uses a constant number of colours with value bigger than 
A -I- 1, so as greater is the value of A so smaller is the ratio among them (i.e. 
A+£'i 

A+lk 

Algorithm FNPS uses in almost all cases 1.35 A to 1.40 A colours in order to 
colour the 780 graphs tested in the case where k = 20. In the case where k = 100 
it uses slightly more colours, i.e. 1.38A to 1.41 A. 

When ki's have different values we examine the cases where x = 5 and x = 15. 
In these cases observe that the performance of our algorithm gets worse, but it is 
still better from the performance of the other algorithms. Algorithm MDsatur 
gives similar results as in the case where all ki’s have the same value in only 50% 
of the tested graphs (i.e. it uses I.OIA to 1.09A colours). We observe that it uses 
at most 1.25A colours to colour 0.06% of the graphs tested. The rest of these 
graphs are coloured with I.IA to 1.19A colours by MDsatur. The colouring of 
each of these graphs is uniformly distributed in this range. On the other hand, 
the performance of FNPS becomes better in this case. 

The results presented in Tables 3 and 4 show the worst behaviour of algo- 
rithms MDsatur and FNPS on the squares of the graphs of set S3. For our 
algorithm these results are obtained when we named the vertices of G randomly. 
On the other hand, when we label these vertices sequentially (namely nearby 
vertices in an embedding of G receive consecutive integers as index), then it 
colours with A -|- c colours almost all (about 100%) of the graphs of set S3. The 
behaviour of algorithm FNPS is exactly opposite. FNPS colours almost all the 
squares of the graphs of set S3 with at most 1.2A colours, when the vertices are 
randomly named, in contrast to 1.4 A in the other case. 

Finally, on the squares of the graphs of set S3 both greedy heuristics consid- 
ered {RF F ,M axl Scover) colour the square of each such planar graph with ER 
at most 1.15 in almost all the cases. The first of this heuristics, RFF, uses the 
same number of colours, that is 1.15A(G), in almost each of the 780 graphs tested 
(for each value of k). The second heuristic MaxIScover uses 1.02A to 1.15A 
colours uniformly spread in the above range on the squares of these graphs. 
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For the case where the values of fci’s are not the same, the performance of 
RFF gets worse. Especially, when x = 5. In this case it uses 1.3 A colours in 
20% of the graphs tested. Observe that its performance is better when x = 15. 
The heuristic MaxIS cover has almost the same results as in the above case. 

At this point we discuss the experimental performance of algorithm AFl . 
This algorithm colours the square of a planar graph with maximum degree at 
least 749. Because of this condition it is obvious why we do not present results 
from this algorithm. On the other hand, in order to study its behaviour and 
not the exact number of colours that it uses, we did experiments increasing the 
value of the variable inductiveness (see the pseudo code). From these results we 
can conclude that it has similar behaviour as algorithm FNPS. Namely, when 
it selects the next vertex to be coloured at random (beyond the vertices which 
satisfy the proper condition i.e. have less than 1.8Z\ neighbours in the already 
coloured subgraph of G^), then it colours even the modified icosahedron with 
k = A/5 with 1.2A colours in almost all cases. Observe that exactly this graph 
gives the tightness on its performance as we explain in Section 3. 

5.2 RadioColouring a Planar Graph 

In this paragraph we notice that our algorithm RadioColouring min order min 
span radiocolours each of the planar graphs, G, used in the experiments shown 
in Tables 1-3 using at most four new extra colours in addition to the colours 
used by any of the algorithms: RFF, MaxIScover, MDsatur and FNPS, to 
colour G^. 

The similarity in its behaviour on each of the set of graphs used and on each 
of the colourings of their squares allow as to make the following conjecture: 
Xspan{G) — X order (G) < c, where c is a small constant. 
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Abstract. In this paper we experimentally evaluate the performances 
of some natural online algorithms for the bicriteria version of the classical 
Graham’s scheduling problem. In such a setting, jobs are characterized 
by a processing time and a memory size. Every job must be scheduled 
on one of the m processors so as to minimize the time makespan and the 
maximum memory occupation per processor simultaneously. We consider 
four fundamental classes of algorithms obtained by combining known 
single-criterion algorithms according to different strategies. The perfor- 
mances of such algorithms have been evaluated according to real world 
sequences of jobs and to sequences generated by fundamental probabil- 
ity distributions. As a conclusion of our investigation, three particular 
algorithms have been identified that seem to perform significantly better 
than the others. One has been presented in [4] and is the direct bicriteria 
extension of the basic Graham’s greedy algorithm, while the other ones 
are given by two different combinations of the Graham’s algorithm and 
the Albers’ algorithm proposed in [1]. 



1 Introduction 

In the last years a considerable research activity has been devoted to miilticrite- 
ria optimization problems, in which a feasible solution must be simultaneously 
evaluated with respect to different criteria and/or cost functions. For instance, 
one can ask for the determination of a spanning tree of a graph whose global 
cost is low with respect to two different weightings of the edges [17, 22], or such 
that the diameter is low with respect to first weighting and has a low global cost 
with respect to the second one [17]. 

Many scheduling problems have been investigated under a multicriteria op- 
timization point of view. As an example, in [24] the authors considered the 
problem of minimizing the makespan on one machine among the solutions with 
minimum maximum lateness, in [23] scheduling algorithms have been presented 
for optimizing the average completion time given an upper bound or budget on 

’* Work supported by the 1ST Programme of the EU under contract number IST- 
1999-14186 (ALGOM-FT), by the EU RTN project ARACNE, by the Italian project 
REAL- WINE, partially funded by the Italian Ministry of Education, University and 
Research, and by the Italian GNR project GNRG003EE8 (AL-WINE). 
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the possible makespan, and in [5, 25] the authors dealt with the simultaneous 
minimization of the makespan and total weighted completion time. Other results 
with respect to various multicriteria objectives can be found in [9, 12, 13, 14, 
18, 20, 21, 25, 27]. A survey on the multicriteria scheduling literature up to ’93 
is given in [19]. 

In the bicriteria extension of the classical Graham’s problem jobs are char- 
acterized by a processing time and a memory size and have to be scheduled so 
as to simultaneously minimize the time makespan and the maximum memory 
occupation per processor. Such a problem has been first considered in [4], where 
among the others an online algorithm has been presented whose two competitive 
ratios are both always less than 3. However, no results are given concerning its 
practical behavior. 

In this paper we propose four natural classes of online algorithms for the 
bicriteria scheduling problem that include the one of [4]. They are obtained 
by combining according to different strategies the single-criterion algorithms of 
Graham [10], Bartal et al.[3], Karger et al.[16] and Albers [1], that in the order 
achieved competitive ratios 2, 1.985, 1.945 and 1.923. 

We evaluate the performances of the presented bicriteria algorithms accord- 
ing to real world sequences of jobs and to sequences generated by fundamental 
probability distributions. Our investigation shows that the results heavily de- 
pend on the characteristic of the particular sequence. However, three algorithms 
seem equivalent and to outperform all the others in all the different cases. One 
corresponds to [4], while the other ones are obtained by two different combina- 
tions of the Graham’s and Albers’s algorithms. 

The paper is organized as follows. In the next section we introduce the basic 
notation and the various bicriteria algorithms. In Section 3 we give a detailed 
description of the experiments. In Section 4 we discuss the obtained results and 
finally, in Section 5, we give some conclusive remarks. 



2 Definitions and Notation 

In this section we introduce the basic notation and the the bicriteria algorithms 
used in the experimental analysis. 

We denote by m the number of machines and by cr =< pi,...,p„ > the 
input sequence. Each job 1 < j < n, is characterized by a pair of costs 
{tj , Sj), where tj represents the time required to process job pj and Sj its memory 
occupation. 

A scheduling algorithm A is online if it assigns each job pj G a to one of 
the m machines without knowing any information about the future jobs. We 
denote as T*(A) the completion time of machine i after job pj is scheduled 
according to A and as Tj the sum of the processing times of the first j jobs, 
that is, Tj = J2h=ith, or analogously Tj = Similarly, let Sj{A) 

be the memory occupation on machine i after job Pj is scheduled according 
to A and Sj be the total memory occupation of the first j jobs. The time and 
the memory costs of algorithm A are, respectively, t{A, a) = max"f ^nd 
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s{A,a) = max™ S'^(A). For the sake of simplicity, when the algorithm A is 
clear from the context, we will drop A from the notation. 

Let t*(cr) be the minimum makespan required to process cr, independently 
from the memory. Analogously, let s*(ct) be the minimum memory occupation 
per machine required by a ignoring the time. 

Definition 1. An algorithm A is said c-competitive if, for all possible sequences 
C; ^ c • and s(a) < c ■ s*((t). 

Let us briefly describe the single-criterion online algorithms used in our bi- 
criteria extensions. Let tOj be the cost of the job pj, be the machine with 
the i-th smallest load after the first j jobs have been scheduled, 1 < i < m, 
and let A® be the average load on the i smallest loaded machines after the first j 
jobs have been scheduled. Then the single-criterion algorithms are defined as 
follows: 

Graham’s algorithm: Assign job pj to the least loaded machine, i.e. Mj. 
Bartal’s algorithm: Set k = [0.445m] and e = ^. Assign job pj to ma- 
chine if Ijifl + ujj < (2 — e)Aj_^ (basic choice), otherwise assign pj to 

the machine with the smallest load (alternative choice). 

Karger’s algorithm: Set a = 1.945. Assign job pj to the highest loaded ma- 
chine (that is with the highest k) such that < a ■ A^zl (basic 

choice). If there is not such a machine, assign pj to the machine with the smallest 
load (alternative choice). 

Albers’s algorithm: Set c = 1.923, k = and r = 0.29m. Set a = 
/ ■ Let Lj be the sum of the loads on machines Mt Mf if is 
scheduled on the smallest loaded machine. Similarly, let Lfi be the sum of the 
loads obtained considering the machines ..., M™ if pj is scheduled on the 

smallest loaded machine. Let 5™ be the makespan if pj is scheduled on the ma- 
chine with the (fc-l- l)-st smallest load. Assign pj to the smallest loaded machine 
if at least one of the following conditions holds: (a) Li < a-Lh] (b) 5™ > c- 
(basic choice). Otherwise, assign pj to the machine with the {k + l)-st smallest 
load (alternative choice). 

Notice that these last three algorithms maintain a list of machines ordered by 
non decreasing load. In most of their bicriteria extensions, a total order has still 
to be maintained and is obtained by means of a single cost measure given by a 
proper linear combination between the time and memory loads of each machine. 
To this aim, we observe that a simple combination that at each step j assigns 
to each processor i a combined load C* = Tf + S'* might be extremely com- 
promising for one cost measure when the other is characterized by substantially 
bigger values. In particular, this is true when the respective optima are very 
different. However, equal optima can be obtained by scaling up one of the two 
costs measures, thus guaranteeing a substantial equivalence. More precisely, if 
aj is the subsequence of the first j jobs of cr, it suffices to define G* = Tf + XjSj 
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with \j = . Unfortunately, computing t*{aj) and s*{aj) is an intractable 

problem. However, the following lower bouirds oir and s*(aj) provide ap- 

proximations that rapidly converge to t*{aj) aird s*(aj) as j iircreases. Let 
be the processing times of the first j jobs listed hr iroir iircreasing 
order, that is in such a way that ti > ... > tj. Then t*(aj) > Ltiuj) = 
max{Tj /m,ti, 2-tm+i, S-hm+i, • ■ • , where h is the largest integer 

such that {h — l)m -I- 1 < j. The lower bound Ls{aj) on s*(cjj) can be defined 
accordiirgly, so that Aj = can be easily calculated at step j. 

Let us irow iirtroduce the differeirt families of algorithms. 



2.1 Bicriteria Graham Based Algorithms 

All the algorithms belonging to this family select at each step the subset of the k 
machines with the lowest time load, where fc is a parameter at most equal to 
[ whose value has been determined during the experimental analysis so as to 
minimize the maximum competitive ratio. The particular machine among the k 
selected ones chosen to schedule the current job is their determined according to 
the memory loads by applying one of the above single-criterion algorithms, thus 
obtaining corresponding bicriteria algorithms that we call respectively Graham- 
Graham, Graham-Bartal, Graham-Karger and Graham- Albers. The algorithm 
Graham-Graham corresponds to the one of [4] . 

The idea behind this family is that of improving the performance of Graham- 
Graham by exploiting the better behavior of the other single-criterion algorithms 
when applied to real world cases (see [2]). 

2.2 Weak Bicriteria Algorithms 

These algorithms are obtained by reducing each bicriteria instance to a single- 
criterion one. More precisely, this is accomplished by assigning each job pj = 
(tj,Sj) a cost ujj = tj + XjSj, with Xj determined as above, and then by ap- 
plying the above single-criterion algorithms assuming at each step j a combined 
load Cj_i for each machine i, 1 < i < m. We call Weak-Graham, Weak-Bartal, 
Weak-Karger aird Weak-Albers the correspoirding algorithms. 

2.3 Half Bicriteria Algorithms 

This family extends all the siirgle-criterioir algorithms, except Graham’s, by 
considering at each step j the total order induced by the combined loads Cj, 
1 < i < m. Then, if the basic choice can be performed both on the time and mem- 
ory measures, the job is scheduled on such a machine, otherwise on the unique 
one given by the alternative choice. We denote as Half-Bartal, Half-Karger and 
Half- Albers the corresponding algorithms. 
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2.4 Strong Bicriteria Algorithms 

As in the previous case, this family extends all the single-criterion algorithms, 
except Graham’s and the basic choice is made by considering the order induced 
by the combined loads. However, the alternative choice is performed according 
to the bicriteria algorithm Graham-Graham, which in some sense chooses the 
smallest loaded machine in a bicriteria way. We call Strong-Bartal, Strong-Karger 
and Strong- Albers the corresponding algorithms. 

The main reason leading to the definition of this family is that, as observed 
during our experiments, the basic choice is always taken only the 10% of the 
steps. Thus, strong bicriteria algorithms are very close to Graham-Graham and 
might improve when basic choices correspond to better selections. 

Particular attention deserves Strong- Albers. In such an algorithm, the (fc-|-l)- 
st smallest loaded machine {k = [^J) corresponding to the alternative choice, 
for each single measure is rather close to the one chosen by Graham-Graham. 
In fact, Graham-Graham selects a machine whose time and memory loads are 
always below the average. Such a machine tends to be not far from the (fc-|- l)-st 
position in both the two orderings that can be obtained according to the two 
measures. 

3 Description of the Experiments 

The experiments performed to investigate the behavior of the various bicriteria 
algorithms belong to two different classes, according to whether the sequences 
of the jobs come from real systems or are generated according to probability 
distributions. 

In the first case, the traces used in the tests are taken from two different 
systems. One sequence consists of traces obtained from the log files of the San 
Diego Supercomputer Genter (SDSG) [8]. The second sequence is relative to the 
Gomputer GM-5 installed in the Los Alamos National Lab (LANL) and is part 
of the NPAGI job traces repository [11]. 

The following tables summarize the main characteristics of the two traces 
respectively for the time and memory measures. They will be useful for the 
interpretation of our results. 



System 


Year 


Numb, of jobs 


Time Avg 


Min Time 


Max Time 


Variation 


SDSG 


1998-2000 


67,667 


84,079 


1 


7,345,105 


4 


LANL 


1994-1996 


201,387 


1,000 


1 


25,200 


2 


System 


Year 


Numb, of jobs 


Mem. Avg 


Min Mem. 


Max Mem. 


Variation 


SDSG 


1998-2000 


67,667 


15,034 


1 


360,000 


1.3 


LANL 


1994-1996 


201,387 


3,677 


1 


29,124 


1.1 



The sequences of the second class have been generated according the fol- 
lowing probability distributions: uniform, exponential. Erlang, hyperexponential 
and Bounded-Pareto distribution [15, 26]. Each distribution is characterized by 
corresponding parameters that, following the suggestions in [6, 7], have been 
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properly set in order to get realistic sequences and to cover a great range of 
values. The most common distributions used to model the processing times of 
the jobs in computing systems are the exponential, hyperexponential and Erlang 
distributions. However, for the sake of completeness, we have considered also the 
uniform distribution. In addition, we included the Bounded-Pareto distribution, 
which seems to be close to real world instances [2] . 

Similar considerations hold also for the memory sizes of the jobs, although 
in this case values tend to be more uniform and with a lower variation with 
respect to the processing times. As a consequence, the hyperexponential and 
Bounded-Pareto distributions are less realistic. 

Since we are randomly generating both processing times and memory sizes, 
we also consider different combinations of the above distributions. In such a set- 
ting we restricted to the most reasonable and mostly used combinations, that 
is hyperexponential and Bounded-Pareto for the time, and uniform, exponential 
and Erlang for the memory (again see [6, 7] for technical motivations). 

The algorithms have been tested on sequences of 10, 000 jobs and their current 
maximum competitive ratio has been computed after having scheduled each job, 
even if the competitive ratios have been evaluated exploiting the lower bounds 
Lt{aj) and Ls{uj) as approximations of and s*{aj). The number of ma- 

chines has been set to 10, 50, 100, 200 and 500. 

4 Analysis of the Results 

In this section we present the experimental results of our tests and give a de- 
tailed analysis of the performances of the various bicriteria algorithms defined 
in section 2. 

Due to space limitations, in the following figures we give only the experiments 
for a number of machines to = 10 and to = 500. In every case, the results for 
TO = 50 are close to the ones for to = 10, while there is a substantial equivalence 
for TO = 100, TO = 200 and to = 500. Moreover, for the sake of clarity, we 
depict only the curves representing the maximum competitive ratio of the best 
algorithm in each class, plus the one of Graham-Graham. 

4.1 Results for Real World Jobs 

Let us first discuss the results obtained for a small number of machines, that is 
for TO = 10 and to = 50. 

Figure la shows the (maximum) competitive ratio of each algorithm applied 
on the first trace when to = 10. For all the algorithms it is possible to note 
a greater fluctuation of the competitive ratios for the first 2, 000 jobs. In this 
interval, the oscillations are between the values 1.2 and 1.8. The only exception 
(even though it is not in the figure) is the Weak-Karger algorithm, for which the 
competitive ratio rapidly grows to reach the value 3. The bad behavior of this 
algorithm confirms the results in [2]. After the first 2, 000 jobs, the competitive- 
ness tends to become more stable and the best results are obtained in the order 
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by Strong-Albers, Weak-Graham, Graham-Graham, Graham-Bartal, Graham- 
Albers and Graham-Karger, with Strong-Albers, Weak-Graham and Graham- 
Graham very close to each other. For all these algorithms the competitive ratio 
is bounded between 1.1 and 1.2. 

Figure 2a shows the results for the second trace. The general trend is similar 
to the one of the first trace, even if the competitive ratios are significantly better 
as they tend to 1 . It is also possible to note a higher stability of the competitive 
ratios and thus a faster convergence. The best performances are achieved in 
the order by Graham-Albers, Graham-Graham, Strong-Albers, Weak-Graham, 
Graham-Bartal and Graham-Karger. 

For both the two traces, the best performance of the Graham based algo- 
rithms is achieved when k= 

Let us now consider the results for a high number of machines, that is for 
m = 100, m = 200 and m = 500. The evolution of the competitive ratios is better 
in the first trace, due to a longer transient phase in the second one (see Figures 
lb and 2b). However, the situation seems to change right after 10, 000 jobs. At 
the end of the traces the competitive ratios are still greater than 1.4, while for 
a small number of machines they approached 1.1. For the first trace the best 
algorithms are Strong-Albers and Graham-Graham, while for the second one 
Graham-Albers, Graham-Bartal, Graham-Karger, Strong-Bartal, Strong-Albers 
and Graham-Graham. 

The best performance of the Graham based algorithms is achieved when k is 
close to 

Notice that all the executions correspond to the prefix of the first 10, 000 jobs 
of each trace. However, our experiments show that the results are completely 
analogous considering any subsequence of 10, 000 jobs. 

In order to understand our experimental results, it is necessary to recall 
first the classification presented in [2] concerning the effects of each incoming 
new job pj on the competitive ratio of an online single-criterion algorithm. In 
particular, three different effects have been identified: 

1. If the cost of Pj is small with respect to the average load of the machines, 
the job cannot have significant effects on the competitive ratio of an online 
algorithm. This happens mainly at the end of the traces. 

2. If the cost of Pj is of the same order of the average load of the machines, the 
competitive ratio grows. This is a consequence of the fact that any online 
algorithm tries to keep the load of the machines quite balanced in order 
to avoid a too high competitive ratio. Thus, the machine receiving pj in 
general reaches a load significantly higher than the average one. The optimal 
algorithm instead will reserve a low loaded machine for this job, thus causing 
a significant increase on the competitive ratio which justifies the fluctuations 
observed in the tests when the number of jobs j is small. 

3. If the cost of Pj is very big with respect to the average load of the ma- 
chines, the job will dominate both the online and the optimal solution. As 
a consequence, the competitive ratio approaches the value 1. 
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Fig. la. Trace n°l on 10 machines Fig. lb. Trace n°l on 500 machines 







Fig. 2a. Trace n°2 on 10 machines Fig. 2b. Trace n°2 on 500 machines 




Fig. 3a. Uniform distribution on 10 Fig. 3b. Uniform distribution on 500 

machines machines 
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Fig. 4a. Time hyp. and memory unif. distr. Fig. 4b. Time hyp. and memory unif. distr. 
on 10 machines on 500 machines 





Fig. 5a. Time B. Pareto and memory unif. Fig. 5b. Time B.Pareto and memory unif. 
distr. on 10 machines distr. on 500 machines 





Fig. 6a. Trace n°l: fluctuation for the first Fig. 6b. Trace n°2: fluctuation for the first 
100 jobs on 10 machines 100 jobs on 10 machines 
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At this point, it is easy to understand that the situation in which a sequence 
of small jobs (effect 1) is followed by a job causing an effect of type 2 is the 
worst for any Graham based algorithm. The other kinds of algorithms instead, 
in general tend to keep some of the machines lightly loaded in order to reserve 
a portion of the resources for jobs causing an effect of type 2. 

The probability that one of the three effects occurs and the magnitude of 
its influence on the competitive ratio heavily depend on the characteristic of 
the trace and of the tested algorithm. Obviously, if the variability of the costs 
of the jobs is low, effects of type 2 and 3 are very unlikely to occur. From the 
tables of the previous section it can be seen that the coefficient of variation of 
trace 2 is lower than the one of trace 1 and this explains why on this trace the 
competitive ratios of the algorithms are more stable and the Graham based ones 
perform better. This is especially evident for a big number of machines, like 
when m = 500. 

After having analyzed all the different experiments, including the ones not 
depicted in our figures, we can state that the algorithms that have shown the 
best practical behavior are Graham-Graham, Graham- Albers and Strong- Albers, 
with the last two ones performing slightly better. This appears more evident on 
the first trace where the variability of the jobs is greater. More in general we can 
state that both the Weak-bicriteria and the Half-bicriteria algorithms are dom- 
inated by the Graham-Graham algorithm, the best performances are achieved 
by the Graham based and Strong-bicriteria classes and finally the Karger’s al- 
gorithm in all of its extensions has not a good behavior. 

4.2 Results on Probabilistic Sequences 

Let us now discuss the results obtained for the sequences of jobs generated 
according to probability distributions. 

Figure 3a shows the trend of the competitive ratios when to = 10 and the 
jobs are generated using the uniform distribution. Except for the first 500 jobs, 
there is an almost total lack of oscillations. The best performance in the order is 
obtained by Graham- Albers, Strong-Albers, Graham-Bartal, Graham-Graham, 
Weak-Graham and Graham-Karger. For these algorithms the competitive ratios 
approach 1 already after the first 1,000 jobs. For the other distributions the 
results are essentially the same, even if for the hyperexponential and Bounded- 
Pareto distributions a general increase in the number of jobs is needed before 
the competitive ratios become stable. 

Figure 3b shows the results for to = 500 again when jobs are generated using 
the uniform distribution. As we expected, the ratios follow the same trend of the 
previous case with a natural increase in the magnitude of the oscillations and 
a slower convergence, this time toward 1.1. While the results for the exponential 
and Erlang distributions are essentially the same, after 10, 000 jobs the hyper- 
exponential and Bounded-Pareto distributions are still not stable and the ratios 
tend to be significantly higher, that is respectively around 1.4 and 1.8. 

Figure 4a, 4b, 5a and 5b concern the more interesting case of different dis- 
tributions for 10 and 500 machines. The depicted combinations are the time 
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hyperexponential and Bounded Pareto distributions versus the memory uniform 
one. When using the Erlang or exponential distributions for the memory in- 
stead of the uniform one, the trends are almost coincident. For the time hyper- 
exponential distribution the best algorithms in the order are Graham-Albers, 
Strong-Karger, Graham-Graham and Strong- Albers, with ratios converging to 1 
for m = 10 and 1.4 for m = 500. For the Boimded-Pareto distribution, the best 
algorithms are Graham-Albers, Graham-Karger, Graham-Bartal and Graham- 
Graham, with ratios converging to 1.2 for m = 10 and 1.4 for m = 500. It 
is worth noting the good behavior of Weak-Albers for the test in Figure 5b, 
especially on the first half of the sequence. 

In all the above cases, the best Graham based algorithms are obtained for 
k = \^~\ for a small number of machines, with k tending to ^ as to increases. 

The key point for the interpretation of these results is again the coefficient of 
variation of the sequences. It is well known that the uniform, the exponential and 
the Erlang distributions have a coefficient not greater than 1, thus generating 
sequences of jobs for which the competitive ratios of our algorithms are quite 
stable. The fluctuations become relevant only for a large number of machines. 
On the other hand, by using the hyperexponential or the Bounded-Pareto dis- 
tributions, it is possible to generate sequences of job whose characteristics are 
similar to those of real world traces, that is with a coefficient greater than 1, 
thus obtaining bigger fluctuations and a much slower convergence. However, the 
algorithms with the best performances remain the same identified for real world 
traces, thus confirming their good behavior. 

4.3 Results for j — 100 and m — 

In this subsection we focus on short sequences in order to better understand the 
behavior of the competitive ratios during the transient phases in which they have 
big oscillations. Here we can also perform a tighter analysis since for a relatively 
small number of jobs it is possible to efficiently compute the optimal offline 
solutions. 

Figure 6a and 6b show the results achieved considering sequences of 100 jobs 
on only 10 machines for the two real world traces. As to the performances of 
the algorithms, besides Graham-Graham, Graham-Albers and Strong- Albers al- 
gorithms, it is possible to observe a good behavior of Weak-Albers, Half-Bartal, 
Strong-Bartal and Half- Albers. This suggests that using the technique of reduc- 
ing to a single-criterion instance and running the Bartal’s or Albers’s algorithm 
works during the transient phases. However, in general, apart from the Karger 
based algorithms, all the others are more or less equivalent. Often some algo- 
rithms dominate the others at the beginning of the trace, but the situation 
changes toward the end. 

Similar results hold also for the sequences generated according to probability 
distributions. 
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5 Conclusions 

As a result of our experimental analysis, it is possibile to derive the following 
conclusions. 

The performances of the algorithms depends heavily on the characteristics 
of the sequences. Graham- Albers, Strong- Albers and Graham-Graham seem to 
be the best algorithms, with the first two ones performing only slightly better 
in a few cases. 

As the number of the machine increases, the transient phases tend to be 
longer. In fact, the lower average load at each step makes the competitive ratios 
more sensible to effects of types 2 and 3 (see Section 4), thus giving rise to bigger 
oscillations and slower convergence. As it can be observed, the competitive ratios 
become stable when the ratio ^ becomes greater than 100. 

In general, the Graham based algorithms and the strong bicriteria ones (ex- 
cept Graham-Karger and Strong-Karger) have always a good behavior and out- 
perform the other families. The worse performance of the extensions of the 
Karger’s algorithm is not surprising, as it aims to maintain some processors 
more loaded than the others. 

The performance of the Graham based algorithms is better when the pa- 
rameter k tends to ^ as m increases. In fact, from a worst case point of view, 
setting k = \^~\ guarantees that the k lightest time loaded machines have all a 
load at most 2Tj/m, that is at most twice the average. However, the selection 
of the single machine among these k ones is done ignoring the time load, thus 
not letting unlikely the choice of a machine with time load close to 2Tjfm. The 
situation is better for the memory measure. In fact, while we are guaranteed 
about the existence of a machine among the k ones with memory load at most 
2Sj/m, as m increases the probability of the existence of machines with mem- 
ory load much less than 2Sj/m increases. Therefore, a selection considering the 
memory loads in general yields a much better outcome with respect to the worst 
case. Decreasing k in general causes a better choice for the time and a worse one 
for the memory. This confirms why experimentally it results that the outcome 
becomes balanced for values of k decreasing as m increases. 

We notice also that different families of algorithms are possible. However, all 
the ones that exploit single-criterion reductions for making selections in general 
have a worse performance. Weak-bicriteria algorithms are the worst ones and 
their behavior is satisfactory only during the transient phases. 

Surprisingly, in many cases the maximum competitive ratio tends to values 
close to 1. It would be interesting to experimentally evaluate the performance 
of these algorithms for more than d > 2 criteria and to estimate the increase of 
the ratios as a function of d. 

Glearly our experiments are not exhaustive, as they considered the cases that 
have been suggested as more realistic in the scientific literature. It would be nice 
to determine and test other realistic scenarios. 
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Abstract. We present a new variant of the Boyer-Moore string match- 
ing algorithm which, though not linear, is very fast in practice. 

We compare our algorithm with the Horspool, Quick Search, Tuned 
Boyer-Moore, and Reverse Factor algorithms, which are among the 
fastest string matching algorithms for practical uses. It turns out that 
our algorithm achieve very good results in terms of both time efficiency 
and number of character inspections, especially in the cases in which 
the patterns are very short. 

Keywords: string matching, experimental algorithms, text processing. 



1 Introduction 



Given a text T and a pattern P over some alphabet S, the string matching 
problem consists in finding all occurrences of the pattern P in the text T. It is 
a very extensively studied problem in computer science, mainly due to its direct 
applications to several areas such as text processing, information retrieval, and 
computational biology. 

A very comprehensive description of the existing string matching algorithms 
and a fairly complete bibliography can be found respectively at the following 
URLs 



— http: //www-igm. univ-mlv.fr/~lecroq/string/ 

— http:// liinwww.ira.uka.de/bibliography/Theory/tq.html. 



We first introduce the notation and terminology used in the paper. We denote 
the empty string by e. A string P of length m is represented as an array P[0..m — 
1]. Thus, P[i] will denote the {i + l)-st character of P, for i = 0, . . . , to — 1. For 
0 < * < J < length(P), we denote by P[i..j] the substring of P contained between 
the {i + l)-st and the {j + l)-st characters of P. Moreover, for any i and j, we 
put 




P[max(i,0),min(j, length(P) — 1)] 



if * > j 
otherwise . 



If P and P' are two strings, we write P' Zl P to indicate that P' is a suffix 
of P, i.e., P' = P[L.length(P) — I], for some 0 < f < length(P). Similarly we 



K. Jansen et al. (Eds.): WEA 2003, LNCS 2647, pp. 47-58, 2003. 
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48 



Domenico Cantone and Simone Faro 



write P' IZ P to indicate that P' is a prefix of P, i.e., P' = P[0..i — 1], for some 
0 < i < length(P). 

Let T be a text of length n and let P be a pattern of length m. If the character 
P[0] is aligned with the character T[s] of the text, so that the character P[i] is 
aligned with the character T[s+z], for z = 0,...,to— 1, we say that the pattern P 
has shift s in T. In this case the substring P[s..s + m — 1] is called the current 
window of the text. If T[s..s + m — 1] = P, we say that the shift s is valid. 

Most string matching algorithms have the following general structure: 

Generic_String_Matcher(T, P) 

Precompute -Glohals (P) 
n = length(T) 
m = length(P) 
s = 0 

while s < n — TO do 

s = s + Shift -Increment [s, P, T) 

where 

— the procedure Precompute -Globals(P) computes useful mappings, in 
the form of tables, which may be later accessed by the function 
Shift -Increment {s, P, T); 

— the function S hift -Increment (s, P,T) checks whether s is a valid shift and 
computes a positive shift increment. 

Observe that for the correctness of procedure Generic-String -Matcher 
it is plainly necessary that the shift increment As computed by 
Shift -Increment's, P, T) is safe, namely no valid shift can belong to the interval 
{s + 1, . . . , s + As — 1}. 

In the case of the naive string matching algorithm, for instance, the procedure 
Precompute -Glohals is just dropped and the function Shift -Increment {s, P,T) 
always returns a unitary shift increment, after checking whether the current 
shift is valid. The latter can be instantiated as follows: 

Naive -Shift -Increment's, P, T) 
for * = 0 to length (P) — 1 do 
if P[i] yf T[s + i] then 
return 1 

print (s) 
return 1 



Therefore, in the worst case, the naive algorithm requires 0{mn) character com- 
parisons. 

Information gathered during the execution of the Shift -Increment {s, P,T) 
function, in combination with the knowledge of P as suitably extracted by pro- 
cedure Precompute -Globals(P), can yield shift increments larger than 1 and ulti- 
mately lead to more efficient algorithms. Consider for instance the case in which 
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Shift -Increment's, P^T) processes P from right to left and finds immediately 
a mismatch between P[m — 1] and T[s-|-m— 1], where additionally the character 
T[s-|- m — 1] does not occur in P in any other position; then the shift can safely 
be incremented by m. In the best case, when the above case occurs repeatedly, it 
can be verified that the text T does not contain any occurrence of P in sublinear 
time 0{nfm). 

1.1 The Boyer- Moore Algorithm 

The Boyer-Moore algorithm (cf. [BAI77]) is a progenitor of several algorithmic 
variants which aim at efficiently computing shift increments close to optimal. 
Specifically, the Boyer-Moore algorithm can be characterized by the following 
function BM Shift -Increment {s, P,T) which, as in the previous example, scans 
the pattern P from right to left. BM Shift -Increment {s, P, T) computes the shift 
increment as the maximum value suggested by the good suffix rule and the bad 
character rule below, via the functions gs p and bcp respectively, provided that 
both of them are applicable. 



BM Shift -Increment's, P, T) 

for i = length (P) — 1 downto 0 do 
if P[i] f^T[s + i] then 

return max(gsp(j), i — bcp{T[s + i])) 

print (s) 
return gsp(O) 



If a mismatch occurs at position i of the pattern P, while it is scanned from 
right to left, the good suffix rule suggests to align the substring T[s-|-i-|-l...s-|- 
m — 1] = P[i -|- 1 . . . m — 1] with its rightmost occurrence in P preceded by 
a character different by P[i], If such an occurrence does not exist, the good 
suffix rule suggests a shift increment which allows to match the longest suffix of 
T[s -I-Z-I-1...S-I-TO — 1] with a prefix of P. 

More formally, if the first mismatch occurs at position i of the pattern P, 
the good suffix rule states that the shift can be safely incremented by gsp{i+ 1) 
positions, where 

gsp{j) =i 33 f min{0 < fc < to | P[j — k..m — k — \]Z\ P 

and (fc < j - 1 ^ P[j -1 ]t^ P[j - 1 - fc])} , 

for j = 0, 1, . . . , TO. (The situation in which an occurrence of the pattern P is 
found can be regarded as a mismatch at position —1.) 

The bad character rule states that if c = T[s-|-i] P\i] is the first mismatch- 
ing character, while scanning P and T from right to left with shift s, then P can 
be safely shifted in such a way that its rightmost occurrence of c, if present, is 
aligned with position (s -|- i) of T. In the case in which c does not occur in P, 
then P can be safely shifted just past position (s -|- i) of T. More formally, the 
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shift increment suggested by the bad character rule is given by the expression 
{i — bcp{T[s + i])), where 

bcp{c) max({0 < k < m\P[k] = c} U {—1}) , 

for c £ S, and where we recall that E is the alphabet of the pattern P and 
text T . Notice that there are situations in which the shift increment given by 
the bad character rule can be negative. 

It turns out that the functions gs p and bcp can be computed during the pre- 
processing phase in time 0{m) and 0{m+\E\), respectively, and that the overall 
worst-case running time of the Boyer-Moore algorithm, as described above, is 
linear (cf. [GO80]). 

For the sake of completeness, we notice that originally the Boyer-Moore al- 
gorithm made use of a good suffix rule based on the following simpler function 

gs'p{j) min{0 < k < ra\P[j — k..m — fc — 1] Zl P} , 

for j = 0, 1, . . . , TO, which led to a non-linear worst-case running time. 

Several variants of the Boyer-Moore algorithm have been proposed over the 
years. In particular, we mention Horspool, Quick Search, Tuned Boyer-Moore, 
and the Reverse Factor algorithms, which are among the fastest variants in 
practice (cf. [HorSO], [Smi90], [HS91], and [CCG+94], respectively). In Sect. 3, 
we will compare them with our proposed variant of the Boyer-Moore algorithm. 

1.2 The Horspool Algorithm 

Horspool suggested a simplification to the original Boyer-Moore algorithm, defin- 
ing a new variant which, though quadratic, performed better in practical cases 
(cf. [HorSO] ). He just dropped the good suffix rule and based the calculation 
of the shift increments only on the following variation of the bad character 
rule. Specifically, he observed that when the first mismatch between the window 
r[s..s -|- TO — 1] and the pattern P occurs at position 0 < i < m and the right- 
most occurrence of the character T[s + i] in P is at position j > i, then the bad 
character rule would shift the pattern backwards. Thus, he proposed to compute 
the shift advancement in such a way that the rightmost character T[s -I- to — 1] 
is aligned with its rightmost occurrence on P[0 ..to — 2], if present (notice that 
the character P[m — 1] has been left out); otherwise the pattern is advanced just 
past the window. This corresponds to advance the shift by hbcp{T\s -I- to — 1]) 
positions, where 

hbcp{c) =oef min({l < k < to|P[to — 1 — fc] = c} U {to}) . 

It turns out that the resulting algorithm performs well in practice and 
can be immediately translated into programming code (see Baeza-Yates and 
Regnier [BYR92] for a simple implementation in the C programming language). 
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1.3 The Quick-Search Algorithm 

The Quick-Search algorithm, presented in [Suii90], uses a modification of the 
original heuristics of the Boyer-Moore algorithm, much along the same lines of 
the Horspool algorithm. Specifically, it is based on the following observation: 
when a mismatch character is encountered, the pattern is always shifted to the 
right by at least one character, but never by more than m characters. Thus, the 
character T[s -I- m] is always involved in testing for the next alignment. So, one 
can apply the bad-character rule to T[s -I- m], rather than to the mismatching 
character, obtaining larger shift advancements. This corresponds to advance the 
shift by qbcp{T[s + m]) positions, where 

qbcp{c) =j 3 et min({0 < k < m\P[m — 1] = c} U {m -|- 1}) . 

Experimental tests have shown that that the Quick-Search algorithm is very fast 
especially for short patterns (cf. [LecOO]). 

1.4 The Tuned Boyer-Moore Algorithm 

The Tuned Boyer-Moore algorithm (cf. [HS91]) can be seen as an efficient im- 
plementation of the Horspool algorithm. Again, let P be a pattern of length m. 
Each iteration of the Tuned Boyer-Moore algorithm can be divided into two 
phases: last character localization and matching phase. The first phase searches 
for a match of P[m — 1], by applying rounds of three blind shifts (based on 
the classical bad character rule) until needed. The matching phase tries then to 
match the rest of the pattern P[0..m — 2] with the corresponding characters of 
the text, proceeding from right to left. At the end of the matching phase, the 
shift advancement is computed according to the Horspool bad character rule. 
Moreover, in order to compute the last shifts correctly, the algorithm in the first 
place adds m copies of P[m — 1] at the end of the text, as a sentinel. 

The fact that the blind shifts require no checks is at the heart of the very good 
practical behavior of the Tuned Boyer-Moore, despite its quadratic worst-case 
time complexity (cf. [LecOO]). 

1.5 The Reverse Factor Algorithm 

Unlike the variants of the Boyer-Moore algorithm summarized above, the Reverse 
Factor algorithm computes shifts which match prefixes of the pattern, rather 
than suffixes. This is accomplished by means of the smallest suffix automaton 
of the reverse of the pattern, while scanning the text and pattern from right to 
left (for a complete description see [CCG+94]). 

The Reverse Factor algorithm has a quadratic worst-case time complexity, 
but it is very fast in practice (cf. [LecOO]). Moreover, it has been shown that on 
the average it inspects 0{nlog{m)/m) text characters, reaching the best bound 
shown by Yao in 1979 (cf. [Yao79]). 
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2 Fast- Search: A New Efficient Variant of the 

Boyer-Moore Algorithm 

We present now a new efficient variant of the Boyer-Moore algorithm, called Fast- 
Search, which will use the Fast_Search_Shift-Increment procedure to be given 
below as shift increment function. As before, let P be a pattern of length m and 
let T be a text of length n over a finite alphabet A; also, let 0 < s < m — n be 
a shift. The main observation upon which our Fast-Search algorithm is based is 
the following: 

the Horspool bad character rule leads to larger shift increments than 
the good suffix rule if and only if a mismatch occurs immediately, while 
comparing the pattern P with the window T[s..s + m— 1], namely when 
P[m — 1] yf T[s -I- m — 1]. 

The above observation, which will be proved later in Sect. 2.1, suggests at 
once that the following shift increment rule should lead to a faster algorithm 
than the Horspool one: 

to compute the shift increment use the Horspool bad character rule, if 
a mismatch occurs during the first character comparison; otherwise use 
the good suffix rule. 

This translates into the following pseudo-code: 

Fast -Search.Shift -Increment's ^ P, T) 
m = length(P) 
for i = m — 1 downto 0 do 
if P[i] ^T[s + i] then 
if i = m — 1 then 

return hbcp{T[s + m — 1]) 

else 

return gsp{i) 

print (s) 
return gsp(0) 

Notice that hbcp{a) = bcp{a), whenever a ^ P[m — 1], so that the term 
hhcp{T\s -F m — 1]) can be substituted by hcp{T[s -F m — 1]) in the above proce- 
dure, as will be done in the efficient implementation of the Fast-Search algorithm 
to be given in Sect. 2.2. 

Experimental data which will be presented in Sect. 3 confirm that the Fast- 
Search algorithm is faster than the Horspool algorithm. In fact, we will see that, 
though not linear, Fast-Search compares well with the fastest string matching 
algorithms, especially in the case of short patterns. We also notice that the 
functions hbcp and gsp can be precomputed in time 0{m) and 0{m + IT’D, 
respectively, by Precompute-Globals (P) . 
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2.1 The Horspool Bad Character Rule versus the Good Suffix Rule 

We will show in Proposition 1 that the Horspool bad character rule wins against 
the good suffix rule only when a mismatch is found during the first character 
comparison. To this purpose we first prove the following technical lemma. 

Lemma 1. Let P he a pattern of length m > 1 over an alphabet S . Then the 
following inequalities hold: 

(a) gsp{m) < hbcp{c), for c G S \ {P[m — 1]}; 

(b) gsp{j) > hbcp{P[m- 1]), for j = 0,1, ... ,m - 1. 

Proof. Concerning (a), let c € T' \ {P[m — 1]} and let k = hbcp{c). lik = m, 
then gsp(rn) < hbcp(c) follows at once. On the other hand, if fc < to, then we 
have P[to — 1 — fc] = c, so that P[m— 1 — k] P[m — 1], since by assumption 
P[m — 1] c holds. Therefore gsp{m) <k = hbcp{c), proving (a). 

Next, let 0 < j < TO and let 0 < fc < to be such that 

— P\j — k..m — /c — 1] Zl P, and 

— P[j — 1] P[j — 1 — fc], provided that fc < j — 1, 

so that gsp{j) < fc. If fc < to, then P[m — fc — 1] = P[m — 1], and there- 
fore hbcp{P[m — 1]) < fc. On the other hand, if fc = to, then we plainly have 
hbcp{P[m — 1]) < fc. Thus, in any case, gsp{j) > hbcp{P[m — 1]), proving (b). 

□ 



Then we have: 

Proposition 1. Let P and T he two nonempty strings over an alphabet S and 
let TO = jPj. Let us also assume that we are eomparing P with the window 
T[s..s -|- TO — 1] of T with shift s, seanning P from right to left. Then 

(a) if the first mismatch occurs at position (to — 1) of the pattern P, then 

gspfm) < hhcp{T[s + m— 1]); 

(b) if the first mismatch occurs at position 0 < i< m — 1 of the pattern P, then 

gsp{i + 1) > hbcp{T[s + m— 1]); 

(c) if no mismatch occurs, then 

gsp{0) > hbcp(T[s + m — 1]). 

Proof. Let us first assume that P[to — 1] yf T[s -I- to — 1], i.e., the first mismatch 
occurs at position (to— 1) of the pattern P, while comparing P with T[s..s-|-to— 1] 
from right to left. Then by Lemma 1(a) we have gsp{m) < hbcp{T[s + m — 1]), 
yielding (a). 

On the other hand, if P[m — 1] = T[s-|-to— 1], i.e., the first mismatch occurs 
at position 0 <i<to— lorno mismatch occurs, then Lemma 1(b) implies 
immediately (b) and (c). □ 
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Fast-Search(P, T) 

1. n = length(r) 

2. m — length(P) 

3. T' = T.P 

4. bcp = precompute-bad-cliaracter(P) 

5. gsp = precompute-good-sufSx(P) 

7. s = 0 

8. while bcp(T'[s + m — 1]) > 0 do s = s + bcp(T'[s + m — 1]) 

9. while s < n — m do 

10. j = m — 2 

11. while j > 0 and P[j] = P'[s + j] do j = j — 1 

12. if J < 0 then print(s) 

13. s = s + 3sp(j + l) 

14. while bcp(T'[s + m — 1]) > 0 do s = s + bcp{T'[s + m — 1]) 

Fig. 1. The Fast-Search algorithm 



2.2 An Efficient Implementation 

A more effective implementation of the Fast-Search algorithm can be obtained 
much along the same lines of the Tuned Boyer-Moore algorithm. The main idea 
consists in iterating the bad character rule until the last character P[m — 1] of 
the pattern is matched correctly against the text, and then applying the good 
suffix rule, at the end of the matching phase. More precisely, starting from a shift 
position s, if we denote by ji the total shift advancement after the i-th iteration 
of the bad character rule, then we have the following recurrence: 

A = ji-i + bcp(T[s + ji-i -F m - 1]) . 

Therefore, starting from a given shift s, the bad character rule is applied k times 
in row, where k = min{i | T[s + ji + m — 1] = P[m — 1]}, with a resulting shift 
advancement of jk- At this point it is known that T[s + jk + m — 1] = P[m — 1], 
so that the subsequent matching phase can start with the (m — 2)-nd character 
of the pattern. 

As in the case of the Tuned Boyer-Moore algorithm, the Fast-Search algorithm 
benefits from the introduction of an external sentinel, which allows to compute 
correctly the last shifts with no extra checks. For this purpose, we have chosen 
to add a copy of the pattern P at the end of the text T, obtaining a new text 
T' = T.P. Plainly, all the valid shifts of P in T are the valid shifts s of P in T' 
such that s < n — m, where, as usual, n and m denote respectively the lengths 
of T and P. 

The code of the Fast-Search algorithm is presented in Fig. 1. 

3 Experimental Results 

In this section we present experimental data which allow to compare the running 
times and number of character inspections of the following string matching algo- 
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rithms in various conditions: Fast-Search (FS), Horspool (HOR), Quick-Search 
(QS), Tuned Boyer-Moore (TBM), and Reverse Factor(RF). 

All five algorithms have been implemented in the C programming language 
and were used to search for the same strings in large fixed text buffers on a PC 
with AMD Athlon processor of 1.19GHz. In particular, the algorithms have been 
tested on three Randcr problems, for cr = 2, 8, 20, and on a natural language text 
buffer. 

A Randcr problem consisted in searching a set of 200 random patterns over 
an alphabet S of size cr, for each assigned value of the pattern length, in a 20Mb 
random text over the same alphabet S. We have performed our tests with pat- 
terns of length 2, 4, 6, 8, 10, 20, 40, 80, and 160. 

The tests on a natural language text buffer have been performed on a 3.13Mb 
file obtained from the WinEdt spelling dictionary by discarding non-alphabetic 
characters. All words in the text buffer have been searched for. 

In the following tables, running times are expressed in hundredths of seconds. 
Concerning the number of character inspections, these have been obtained by 
taking the average of the total number of times a text character is accessed, 
either to perform a comparison with a pattern character, or to perform a shift, 
or to compute a transition in an automaton, and dividing it by the total number 
of characters in the text buffer. 

Experimental results show that the Fast-Search algorithm obtains the best 
runtime performances in most cases and, sporadically, it is second only to the 
Tuned Boyer-Moore algorithm. 

Concerning the number of text character inspections, it turns out that the 
Fast-Search algorithm is quite close to the Reverse Factor algorithm, which gen- 
erally shows the best behaviour. We notice, though, that in the case of very 
short patterns the Fast-Search algorithm reaches the lowest number of character 
accesses. 

4 Conclusion 

We have presented a new efficient variant of the Boyer-Moore string matching 
algorithm, named Fast-Search, based on the classical bad character and good 
suffix rules to compute shift advancements, as other variations of the Boyer- 
Moore algorithm. 



Table 1. Running times for a Rand2 problem 



cr = 2 


2 


4 


6 


8 


10 


20 


40 


80 


160 


HOR 


46.05 


44.75 


44.77 


45.12 


44.83 


42.10 


41.23 


40.83 


42.13 


QS 


38.13 


40.59 


42.11 


41.27 


41.13 


38.97 


38.09 


37.04 


37.54 


TBM 


36.27 


36.26 


38.42 


38.87 


38.69 


37.75 


37.81 


37.36 


38.44 


RF 


268.38 


197.88 


149.83 


120.14 


100.02 


60.37 


37.91 


28.40 


22.63 


FS 


38.38 


32.96 


30.19 


27.35 


25.40 


21.04 


18.90 


18.16 


17.39 
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Table 2. Number of text character inspections for a Rand2 problem 



cr = 2 


2 


4 


6 


8 


10 


20 


40 


80 


160 


HOR 


1.83 


1.72 


1.66 


1.66 


1.64 


1.59 


1.64 


1.61 


1.68 


QS 


1.54 


1.65 


1.69 


1.64 


1.63 


1.64 


1.67 


1.60 


1.63 


TBM 


1.23 


1.35 


1.42 


1.45 


1.45 


1.42 


1.46 


2.43 


2.49 


RF 


1.43 


1.06 


.78 


.62 


.51 


.29 


.16 


.09 


.05 


FS 


1.00 


.92 


.80 


.70 


.63 


.45 


.34 


.26 


.22 



Table 3. Runniirg times for a RandS problem 



CT = 8 


2 


4 


6 


8 


10 


20 


40 


80 


160 


HOR 


30.22 


21.99 


21.85 


18.62 


18.04 


17.27 


17.24 


17.11 


17.38 


QS 


22.41 


20.43 


19.48 


17.63 


17.41 


16.93 


16.86 


16.82 


16.94 


TBM 


23.14 


19.51 


18.95 


17.34 


17.07 


16.79 


16.78 


16.73 


16.97 


RF 


120.5 


74.29 


63.99 


48.61 


42.84 


29.16 


22.23 


19.71 


16.48 


FS 


22.06 


19.51 


18.77 


17.11 


16.96 


16.65 


16.64 


16.54 


16.47 



Table 4. Number of text character inspections for a RandS problem 



cr = 8 


2 4 6 8 10 20 40 80 160 


HOR 

QS 

TBM 

RF 

FS 


1.191 .680 .507 .422 .374 .294 .282 .275 .281 

.842 .575 .456 .393 .358 .291 .282 .278 .285 

.663 .386 .291 .245 .218 .174 .168 .164 .167 

.674 .381 .278 .225 .191 .112 .063 .360 .020 

.600 .348 .260 .217 .193 .150 .137 .126 .120 



Table 5. Rumring times for a Rand20 problem 



cr = 20 


2 


4 


6 


8 


10 


20 


40 


80 


160 


HOR 


24.51 


18.56 


17.03 


16.39 


16.01 


15.19 


14.78 


14.84 


14.98 


QS 


19.16 


17.16 


16.19 


15.77 


15.51 


14.93 


14.70 


14.67 


14.69 


TBM 


19.12 


16.68 


15.80 


15.48 


15.25 


14.79 


14.64 


14.57 


14.79 


RF 


96.16 


56.63 


43.32 


36.69 


32.29 


23.43 


19.46 


17.83 


14.62 


FS 


19.11 


16.67 


15.78 


15.43 


15.26 


14.74 


14.58 


14.55 


14.51 



Table 6. Number of text character inspections for a Rand20 problem 



q 

II 

to 

o 


2 4 6 8 10 20 40 80 160 


HOR 

QS 

TBM 

RF 

FS 


1.075 .566 .395 .311 .259 .161 .119 .106 .103 

.735 .463 .346 .282 .241 .156 .118 .107 .103 

.563 .297 .208 .164 .137 .086 .064 .057 .055 

.565 .302 .214 .171 .143 .084 .049 .027 .014 

.538 .284 .198 .156 .131 .082 .060 .054 .051 
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Table 7. Running times for a natural language problem 



NL 


2 


4 


6 


8 


10 


20 


40 


80 


160 


HOR 


3.56 


2.71 


2.48 


2.39 


2.32 


2.18 


2.17 


2.15 


2.01 


QS 


2.77 


2.48 


2.38 


2.33 


2.23 


2.19 


2.16 


2.14 


1.99 


TBM 


2.81 


2.47 


2.32 


2.27 


2.23 


2.21 


2.15 


2.19 


1.91 


RF 


14.44 


8.69 


6.67 


5.69 


4.97 


3.47 


2.84 


2.77 


5.41 


FS 


2.85 


2.39 


2.27 


2.27 


2.20 


2.15 


2.13 


2.12 


1.93 



Table 8. Number of text character inspections for a natural language problem 



NL 


2 4 6 8 10 20 40 80 160 


HOR 

QS 

TBM 

RF 

FS 


1.094 .590 .418 .337 .282 .172 .111 .077 .059 

.759 .489 .375 .309 .261 .175 .125 .086 .069 

.584 .318 .226 .182 .153 .096 .062 .044 .034 

.588 .321 .231 .185 .153 .084 .045 .024 .013 

.550 .299 .211 .171 .143 .087 .055 .038 .028 



Rather than computing the shift advancement as the larger of the values 
suggested by the bad character and good suffix rules, our algorithm applies re- 
peatedly the bad character rule until the last character of the pattern is matched 
correctly, and then, at the end of each matching phase, it executes one applica- 
tion of the good suffix rule. 

It turns out that, though quadratic in the worst-case, the Fast-Search algo- 
rithm is very fast in practice and compares well with other fast variants of the 
Boyer-Moore algorithm, as the Horspool, Quick Search, Tuned Boyer-Moore, 
and Reverse Factor algorithms, in terms of both running time and number of 
character inspections. 
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Abstract. In this paper an on-line algorithm for the Rectangle Packing 
Problem is presented. The method is designed to be able to accept or 
reject incoming boxes to maximize efficiency. We provide a wide compu- 
tational analysis showing the behavior of the proposed algorithm as well 
as a comparison with existing off-line heuristics. 



1 Introduction 

Given a set J of two-dimensional rectangular-shaped boxes, where each box 
j C J is characterized by its width Wj and its height hj , we consider the problem 
of orthogonally packing a subsets of the boxes, without overlapping, into a single 
bounding rectangular area A of width W and height H, maximizing the ratio 
p between the area occupied by the boxes and the total available area A, i.e., 
minimizing the wasted area of A. Glearly, p is in between 0 and 1. It is assumed 
that the boxes cannot be guillotined, and have fixed orientation, i.e., that can- 
not be rotated; moreover, we assume that all input data are positive integers, 
with Wj < W and hj < H, for every j € J. 

The problem is known as “Rectangle (or Two-Dimensional) Packing Prob- 
lem!' (see, for example, [10], [18]), and has been shown to be AfT^-complete ([13]). 
It is a special case of the Two-Dimensional Gutting Stock (or Knapsack) Prob- 
lem, where each box j has an associated profit pj > 0 and the problem is to select 
a subset of the boxes to be packed in a single finite rectangular bin maximizing 
the total selected profit ([15]); clearly, in our case pj = Wj ■ hj. 

The Rectangle Packing Problem appears in many practical applications in the 
cutting and packing (transportation and warehousing) industry, e.g., in cutting 
wood or foam rubber into smaller pieces, and in placing goods on a shelf; other 
important applications are newspaper paging, VLSI floor planning, and also 
GRID computing. 
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A natural application occurs also in multiprocessor task scheduling prob- 
lems, since scheduling tasks with shared resources involves two dimensions (the 
resource and the time) . If we consider the width as the processing time and the 
height as the resources requirement (say processors), we may represent a multi- 
processor task by a box. In particular, the Rectangle Packing Problem models 
a multiprocessor task scheduling problem where a set of H processors arranged 
in a linear configuration as in an array ([4]), and a set J of tasks are given, with 
each task j £ J requiring hj physically adjacent processors for a certain time Wj, 
and the objective is to schedule as many tasks as possible without interruption 
within a deadline W, maximizing the processor total busy time. 

In the literature two versions of the problem have been investigated: the 
off-line and the on-line, respectively. While in the off-line version all the prob- 
lem data are known in advance, in the on-line version where e.g. the over list 
paradigm is considered ([7]), the boxes (with their dimensions) arrive from a list 
without any knowledge on further boxes; in particular, the boxes along with 
their dimensions are known one by one, and when a new box is presented it is 
to be decided if it can be placed into a free rectangular (with equal dimensions) 
sub-area of A, i.e., it can be accepted or rejected. The on-line problem is to 
accept (place) or reject the incoming boxes, maximizing the ratio between the 
occupied area and the total available area A. 

Most of the contributions in the literature are devoted to the off-line prob- 
lem that is solved using several approaches based on optimal algorithms. The 
basic formulation issues and solution procedures for the two-dimensional cutting 
stock problems were presented in [8]. Optimal algorithms for the orthogonal two- 
dimensional cutting were proposed in [1] and in [8] but such techniques may be 
non practical for large instances. Anyhow, for a complete presentation on this 
problem the reader is referred to the survey by Lodi et al. in [15]. Various ap- 
proximation schemes have been proposed e.g. in [9]. Heuristic approaches have 
been considered in [10], [11] and in [18], where rejection is also concerned. 

The on-line case is investigated in many variants mostly deriving from the 
bin packing problem. The problem of packing one-dimensional items in A: > 1 or 
fewer active bins, where each bin becomes active when it receives its first item, 
is solved with several techniques based on combining the so called HARMONIC 
Algorithm ([12]) and the FIRST and BEST FIT rules; the best approxima- 
tion ratio tends to 1.69103 as k tends to infinity. For the more general case 
of the d-dimensional packing problem, a worst ratio equal to h‘^ « 1.69'^ was 
demonstrated ([5]) and for d = 2 the best value for the lower bound currently 
known is 1.907 ([3]). Recently, for the on-line strip packing version, an algorithm 
with modifiable boxes was developed in [16], where a 4-competitive algorithm 
and a 1.73 lower bound on the value of the competitive ratio were presented. 
Such results were developed using similar ideas from [17]. For a survey on on- 
line algorithms for the Packing Problem and many others variants, the reader is 
referred to the chapter by Csiric and Woeginger in [7]. To the best of our knowl- 
edge, the on-line version of the problem we consider has not been investigated, 
and no on-line algorithms with guaranteed competitive ratio are known so far. 
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In this paper, we consider the on-line version of the Rectangle Packing Prob- 
lem in the paradigm of boxes arriving over list, and provide an on-line algorithm. 
Computational results are given to show the performance of the proposed algo- 
rithm, and a comparison with algorithms in [18] is provided. The remainder of 
the paper is organized as follows. In Section 2, we sketch the on-line algorithm, 
and in Section 3 computational results are provided. Section 4 concludes the 
paper with some final remarks. 

2 The On-Line Algorithm 

Without loss of generality, we consider the boxes indexed according to the order 
in which they are presented. We recall that given a list £ = 1,2, ...,n of such 
boxes, an algorithm A considering £ is said to be on-line if: 

— A considers boxes in the order given by the list £; 

— A considers each box i without knowledge of any box j, with j > i] 

— A never reconsiders a box already considered. 

The algorithm operates in n = j Jj iterations. During j-th iteration, in which 
a sub-area C A is available (free), box j is considered to be accepted 

or rejected, and a new (possibly empty) free sub-area of A is defined. Let 
us consider in detail iteration j-th. Let be the non-assigned (free) area 

of A, and {A^^~^\ . . . , a given partition of that is, Ap~^'^ fl 

Aq~^'‘ = 0, for p yf g e {1, ■ ■ • , j} and where each 

is a free rectangular area of A, of width height and size equal 

to See for example Figure 1. Clearly, at the beginning, we 

have A^°^ = A*^°^ = A. 

The box j is accepted if there is a free rectangular area S {A^~^\ 

. . . ,Aj’~^'^} that may satisfy the requirement of j, that is both > Wj 

and > hj, otherwise j is rejected. In particular, if j is accepted let A^^ 

be the smaller (in terms of size) free rectangular area satisfying the requirement 
of j; ties are broken by selecting the smallest area A^'^ with the minimum 
index k. When j is accepted (see Figure 2), a sub-area Xj (of height hj and 
width Wj) in the north-west corner of is assigned to j, leaving two free 

rectangular sub-areas, namely A^'^^ and of A^'^ . In particular, let A^'^^ = 

\ and \ Xj, where is the rectangular sub-area 

of A^i^~^\ with size = min{wj • H\^~^\hj ■ being located in the 

west side of A^^ if = Wj ■ Hjf and in the north side otherwise; with 
this choice we have that the size of A^'^^ is not less than the size of and the 
considered free rectangular area A^^ is reduced by a minimal amount. If j is 
rejected, we consider A^'^^ = and = 0- Finally, let A|^^^ = for 

each h S {!,..., j}\{fc}. The value of the solution found by the on-line algorithm 
is: 
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H 



Fig. 1. Assigned and free rectangular areas at the beginning of iteration (4) 



P = 





n 

minjlF • H, ^ {wj ■ hj)} 
i=i 



( 1 ) 



3 Computational Results 

3.1 Setup of the Experiments 

The proposed algorithm has been tested both on random instances and on 
benchmarks. Referring to randomly generated instances, we experimented with 
n = 10,20,50 and 100 rectangular boxes getting a total number of 4 classes 
of instances. For each class we have considered different test cases according to 
different choices of two parameters, say Wmax and /imax, being the maximum 
width and maximum height, respectively, for the boxes, and different values for 
the width W and the height H of the bounding rectangular area A; in particu- 
lar, we have considered /imax = 5, 10, 15 and Wmax = 5,10,15, and 4 different 
rectangular bounding areas with the following values for the width W and the 
height H: {W, H) = {(15, 20), (20, 30), (25, 50), (15, 50)}, for a total number of 36 
instances for each class. For each one of the 144 different test cases, we randomly 
generated ten instances where the widths and heights of the boxes are uniformly 
distributed in the intervals [1, Wmax] and [1, /imax], respectively. 

As previously said, the proposed algorithm was also tested on some bench- 
mark instances publicly available on the web^. The site contains some rectangle 

^ http://www.ms.ic.ac.uk/jeb/pub/stripl.txt 
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packing data set that have been introduced by Hopper and Turton [10]. As we 
work in an on-line scenario we decided to consider boxes in the same order as 
they appear in the data set. The results obtained by our algorithm on such in- 
stances are compared with the results achieved on the same instance set by an 
off-line heuristic algorithm with rejection proposed by Wu et al. ([18]), in which 
a quasi-human^^ rule based on an ancient Chinese philosophical approach has 
been used. Basically, the rule, called “Less Flexibility Firsf , is based on the 
principle that empty corners inside the bounding area should be filled first, then 
the boundary lines of the free area, and then other empty areas. In other words 
it states a priority order in selecting the empty space inside the bounding area 
to be filled. Under this rule, the rectangle with less flexibility should be packed 
earlier. In [18], Wu et al. implemented such principle in two heuristics, called 
HI and H2, which differ by the fact that H2 is implemented in a more strictly 
manner, i.e., in each packing step only the box with “the least flexibilitif^ will 
be packed. HI and H2 are ones of the most effective off-line procedures for the 
rectangle packing problem. 

Our algorithm and the instance generator have been implemented in the C 
language, compiled with the GNU CC 2.8.0 with the -o3 option and tested on 
a PC Pentium 600 MHz with Linux OS. 



3.2 Experimental Results and Analysis 

In Figures 3-6 we summarize average results of the efficiency index p. It can be 
noted that the lowest efficiency values are achieved by our algorithm on instances 
with a small number of boxes whose dimensions are very close to those of the 
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Fig. 3. Efficiency for W = 15 and H = 20 



bounding area. As soon as the boxes are of small size (i.e., with small width 
and/or small height) the efficiency reach a value almost one. This is due to the 
chance the algorithm has to accept and place “small” boxes in the bounding 
area. 

For the sake of completeness, we also list in Tables 1 and 2 the complete 
results for the scenarios {W = 15, H = 20, n = 10) and (W = 15, H = 20, 
n = 20), respectively, which are the cases where we obtained the worst results. 
For these instances we analyzed also the number of rejected, i.e., non-accepted, 
boxes. In Tables 1 and 2, the first two columns are the maximum width and 
height of a box, respectively, the third column shows the number of the average 
rejected requests, and the last three columns are the minimum value of p, say 
Pmin, the average values of p, say Pave, and the maximum value of p, say Pmax, 
respectively, computed over ten different instances. 

The worst results are obtained for the cases {W = 15, H = 20, n = 
10, Wniax = 15, hmax = 10) and {W = 15, H = 20, n = 10, w max — 15, haiax — 
15), both in terms of number of rejected boxes and efficiency. This is explainable 
by the fact that in these cases we have to consider boxes of large sizes requir- 
ing almost the whole bounding area. A similar situation occurs for the cases 
{W = 15, H = 20, n = 20, Wmax = 5, = 15) and {W = 15, H = 20, 

n = 20, u>niax = 10, hmax = 5), even though the efficiency is higher with respect 
to the previous two cases with n = 10; this can be justified because with n > 20 
we have more chances to efficiently use the available bounding area. As one can 
expect, the worst cases to deal with are those ones with Wmax and/or /imax values 
close to W and H, respectively; indeed, in these cases we could have a very small 
chance to place a box whose size is very close to the size of the whole bounding 
area, especially if small requests have been accepted before implying a reduction 
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Efficiency for W = 20, // =30 
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Fig. 4. Efficiency for W = 20 and H = 30 




Fig. 5. Efficiency for W = 25 and H = 50 
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Efficiency for W =15, H =50 
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Fig. 6. Efficiency for W = 15 and H = 50 



on the size of the available area. Nevertheless, the algorithm seems to perform 
well providing almost always solutions with efficiency value p greater than 0.8. 

In Table 3, we list the results of our on-line algorithm compared with the 
results of the off-line heuristics presented by Wu et al in [18] and tested on the 
same instance set. As discussed in Section 3.1 Wu et al. presented two heuristic 
algorithms, namely HI and H2 in which two variants of the ''‘Less Flexibility 
Firsf principle are implemented. The data set is formed by 21 instances with a 
number of boxes to be packed that ranges from 16 to 197 and bounding rectan- 
gular area size that ranges from 400 to 38, 400. In Table 3 we list in the columns, 
respectively, the instance ID, the number n of boxes in that instance, the width 
of the bounding area, the height of the bounding area, the values of p, pni and 



Table 1. Results for W = 15, FI = 20 and n = 10 



a^max ^max Rejected Pmin P&ve Pmax 
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0.0 


5 


10 


0.0 


5 


15 


1.0 


10 


5 


0.0 


10 


10 


1.5 


10 


15 


3.0 


15 


5 


0.5 


15 


10 


3.4 


15 


15 


6.4 



1.000 1.000 1.000 
1.000 1.000 1.000 
0.911 1.000 1.000 
1.000 1.000 1.000 
0.623 0.923 1.000 
0.731 0.822 0.913 
0.935 1.000 1.000 
0.547 0.561 0.570 
0.637 0.665 0.780 
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Table 2. Results for W = 15, H = 20 and n = 20 



Wmax 


^max 


Rejected pmin 


pave 


Pmax 


5 


5 


0.0 


1.000 


1.000 


1.000 


5 


10 


2.7 


0.680 


0.790 


1.000 


5 


15 


5.8 


0.687 


0.731 


0.767 


10 


5 


2.3 


0.833 


0.891 


0.937 


10 


10 


6.0 


0.783 


0.829 


0.937 


10 


15 


9.2 


0.763 


0.776 


0.807 


15 


5 


7.2 


0.797 


0.832 


0.840 


15 


10 


12.6 


0.643 


0.797 


0.940 


15 


15 


14.5 


0.863 


0.884 


0.947 



PH 2 (being the efficiency of our algorithm and algorithms HI and H2, respec- 
tively), the number of rejected boxes by our algorithm, the number of rejected 
rectangles by algorithm HI and the number of rejected rectangles by algorithm 
H2. 

Before passing to the analysis of the results, we remark that HI and H2 are 
off-line algorithms, contrarily to our on-line algorithm. If we consider the values 
of the efficiency measure p, HI and H2 of course provide in general better results, 
even if our algorithm provide the best possible value in 4 out of 21 instances; 
on average, the value of p over the 21 benchmark instances provided by our 
on-line algorithm is pave = 0.868, while for the off-line heuristics HI and H2 
are 0.973 and 0.954, respectively. Nevertheless, HI and H2 are not efficient at 
all, requiring 0(n® log n) and 0(n'^ log n) computing time, respectively, and the 
CPU time spent by these algorithms is very large (e.g., for instances with n = 97 
HI and H2 take about one hour and more than 10 minutes, respectively); note 
that instances with n = 196 and 197 are not even considered. On the contrary, 
our algorithm is much more efficient, processing n boxes in 0(ri^) time, and the 
CPU time spent is negligible for all the test problems. 

Moreover, if we compare the number of rejected boxes, several times our 
algorithm shows the same or even better performance with respect to HI and 
H2. As it can be inferred by the comparison, our algorithm: 

— is better than HI and H2 in 4 instances, i.e., instances probl, prob4,, problO 
and probl9, where the best possible result is achieved; 

— gives the same number of rejected boxes as HI in instances problS, problO 
(in which it is also superior to H2) and probYl (in which it equals H2, too); 

— shows a worst behavior than HI but better or equal than H2 in instances 
prob2, probb, probS, probl2; 

— has a worst behavior than HI and H2 in instances probS, probO, probl, prob9, 
probll, probl4, problb, problS. 

Note that a comparison with HI is not possible for probl2 because the values 
Rejected and p are not reported in [18]. 
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Table 3. Performance comparison of our on-line algorithm with off-line algo- 
rithms on benchmarks 



Instance ID 


n 


W 


B 


Area 


P Phi 


PH2 


Rejected HI H2 


probl 


16 


20 


20 


400 


1.000 0.980 


0.925 


0 


2 


2 


prob2 


17 


20 


20 


400 


0.744 0.980 


0.925 


2 


1 


2 


probS 


16 


20 


20 


400 


0.870 0.975 


0.937 


2 


1 


1 


probi 


25 


40 


15 


600 


1.000 0.993 


0.951 


0 


1 


3 


probb 


25 


40 


15 


600 


0.827 1.000 


0.958 


1 


0 


1 


probQ 


25 


40 


15 


600 


0.795 1.000 


0.991 


3 


0 


1 


probl 


28 


60 


30 


1,800 


0.846 0.993 


0.977 


3 


1 


1 


probS 


29 


60 


30 


1,800 


0.764 0.991 


0.941 


3 


1 


3 


prob9 


28 


60 


30 


1,800 


0.798 0.992 


0.960 


2 


1 


1 


problO 


49 


60 


60 


3,600 


1.000 0.990 


0.976 


0 


1 


2 


probll 


49 


60 


60 


3,600 


0.849 0.997 


0.948 


4 


1 


3 


probl2 


49 


60 


60 


3,600 


0.736 - 


0.987 


3 


- 


3 


problS 


73 


60 


90 


5,400 


0.898 0.997 


0.976 


1 


1 


4 


problA 


73 


60 


90 


5,400 


0.800 0.999 


0.978 


6 


1 


2 


problb 


73 


60 


90 


5,400 


0.874 0.991 


0.988 


4 


2 


3 


problG 


97 


80 


120 


9,600 


0.922 0.997 


0.966 


1 


1 


4 


probll 


97 


80 


120 


9,600 


0.840 0.962 


0.966 


3 


3 


3 


problS 


97 


80 


120 


9,600 


0.899 0.994 


0.986 


5 


2 


3 


problQ 


196 160 240 38,400 1.000 - 


- 


0 


- 


- 


prob20 


197 160 240 38,400 0.884 - 


- 


5 


- 


- 


prob21 


196 160 240 38,400 0.896 - 


- 


9 


- 


- 



It is worth noticing that, while instances probl9, prob20 and prob2\ are 
not processed by HI and H2, our algorithm provide the best possible solution 
for instance prob\9. Moreover, our algorithm achieves four times p = 1, i.e., 
a complete covering of the bounding area, while the worst performance ratio 
achieved is p = 0.736 (instance pro612). 

4 Conclusions 

In this paper, we presented an on-line algorithm for the Rectangle Packing Prob- 
lem able to accept or reject incoming boxes to maximize efficiency. We performed 
a wide computational analysis showing the behavior of the proposed algorithm. 
Performance results show that the on-line algorithm provides good solutions in 
almost all the tested cases. Moreover, a comparison with existing off-line heuris- 
tics confirms the promising performance of our algorithm. 
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Abstract. The notion of tree-decomposition has very strong theoretical 
interest related to NP-Hard problems. Indeed, several studies show that 
it can be used to solve many basic optimization problems in polynomial 
time when the treewidth is bounded. So, given an arbitrary graph, its de- 
composition and its treewidth have to be determined, but computing the 
treewidth of a graph is NP-Hard. Hence, several papers present heuris- 
tics with computational experiments, but for many instances of graphs, 
the heuristic results are far from the best lower bounds. 

The aim of this paper is to propose new lower and upper bounds for 
the treewidth. We tested them on the well known DIMACS benchmark 
for graph coloring, so we can compare our results to the best bounds of 
the literature. We Improve the best lower bounds dramatically, and our 
heuristic method computes good bounds within a very small computing 
time. 



1 Introduction 

The notion of tree-decomposition has been introduced by Robertson and Sey- 
mour in the context of their research on graph minors [15]. It is based on the 
decomposition of the representative graph G = {V,E) into i separating vertex 
subsets, called separators, connected in a tree. The maximum size of a separa- 
tor minus one in an optimal tree-decomposition is called treewidth. With this 
method, the exponential factor of the complexity depends only on the treewidth 
tw{G) of the graph G and not on its number of vertices n. It uses the property 
that several states of a subgraph can be summarized by the state of a separa- 
tor. So, it can be used to tackle problems on large size graphs with a bounded 
treewidth, using dynamic programming methods. 

There are numerous applications of this method to classical optimization 
problems [13, 6], to probabilistic networks [10], to graph coloration [14], to the 
frequency assignment problem [11], etc. 

Computing the treewidth of a graph is NP-Hard [2] . Several exact methods 
study the decision problem tw(G) < k, but only for very small values of k 
(see [2] for k = 1,2,3 and [3] for an arbitrary k). A way to tackle the problem 
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is to compute good solutions using heuristic methods. A recent paper proposes 
new bounds and computational experiments, but the gap between the lower and 
upper bounds remains large for several instances [12]. 

In this paper, we propose a method to improve the lower bound which uses 
properties stated by Hans Bodlaender [4] and a new heuristic. We developed 
these methods and tested them on the DIMACS benchmark for graph color- 
ing [1], so we can compare our results to previous ones. We improve dramati- 
cally the values of the best lower bounds of the literature and the results of our 
heuristic are close in average to those obtained by the most recent ones, within 
a far smaller computing time. 

In section 2 we explain our notation and give the definition of treewidth. 
Section 3 is devoted to the lower bound. In section 4, we recall the notion of 
triangulated graphs, triangulation and elimination orderings, and we present 
a new heuristic developed to compute upper bounds for the treewidth. The 
computational experiments are presented in section 5 before concluding remarks 
and ideas for future works. 

2 Definitions and Notation 

Let G = (U, i?) be an undirected graph with vertex set V and edge set if C VxV. 
Let n = \V\ and m = \E\. Let u be a vertex, it is a neighbor of u in G if [v, u] G E. 
The set of neighbors of v is called the neighborhood of v and it is denoted N (u) . 
Let deg{v) = |iV(?;)| be the degree of v. A set of vertices Q is called a clique 
if there is an edge between each pair of distinct vertices of Q. For U C V, the 
subgraph induced by U is the graph G[U] = (C/, E\U\) with E\U] = {U x f/) n if. 
The maximum cardinality of a clique in G is denoted w(G) and the minimum 
coloring number of G, x(G). 

For distinct vertices u, u € U, a chain is a sequence of distinct vertices 
[v = vi,V 2 , ■ ■ ■ , Vj = u] such that Vi < j, [ui, Vi+i] € if. A cycle is a sequence of 
distinct vertices [u = ui, U 2 , . . . , Vj = v] such that Vi < j, [vi, Vi+\] G E. A chord 
is an edge between two non-consecutive vertices in a cycle. A graph is connected 
if Vu, u G V, there is a chain between v and u. A tree T = (V, E) is a connected 
graph with no cycle. 

Definition 2.1. (See [15]) A tree-decomposition Dt of G = (U, if) is a pair 
(A, T), where T = {I,E) is a tree with node set I and edge set E, and X = 
\Xi ■. i G I\ is a family of subsets ofV, one for each node of T , and: 

Dt = {{X,/i Gl},T= (/, F)) such that 

1 . = 'F 

2. V[u, u] G E, there is a Xi, i G I with v G Xi and u G Xi 

3. Vi, j, k G I, if j is on the path from i to k in T, then Xi H Xk C Xj 

The width of a tree-decomposition is maXi^i\Xi\ — 1. The treewidth of 
a graph G, denoted tw{G), is the minimum width over all possible tree- 
decompositions of G. 
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A tree-decomposition Dt = {T, A) of G is minimal if removing any vertex v 
from a subset Xi makes Dt to violate one of the three properties of the tree- 
decomposition. 

3 Lower Bounds 

Even in the most recent papers, the lower bounds for the treewidth are far 
smaller than the heuristic results for a large number of graphs. The aim of this 
section is to present an iterative method which improves the value of the best 
lower bounds. It exploits properties stated by Hans Bodlaender [4]. 

3.1 Maximum Clique Bound 

Proposition 3.1. (See [5]) Suppose {{Xi/i g /},T = (I,F)) is a tree-decompo- 
sition of G = {V,E). If W C V forms a clique in G, then there exists an i £ I 
with W C Xi. 



Proposition 3.1 induces that u’(G) — 1 is a lower bound for tw{G). The com- 
putation of this bound is NP-hard and thus it is unknown for a large number of 
graphs. Moreover, the MMD bound [12] introduced below is always greater or 
equal to w(G) — 1. 

3.2 Maximum Minimum Degree Bound (MMD) [12] 

Consider (T,X) an optimal tree-decomposition for G. Let i € / be a leaf of T 
with predecessor j. As the decomposition is minimal (Cf. 2), there exists a vertex 
V G Xi \ Xj (otherwise the node i contains no new information and can be 
deleted). Property 2 in the definition of the tree-decomposition induces that the 
neighbor set of v is included in the node Xi. We have tw{G) > |A(u)|. 

So, a lower bound can be computed by the following method [12]: At each 
step of the algorithm, the vertex of G with lower degree is deleted and its degree 
recorded (Cf. algorithm 3.1). The algorithm stops when the vertex set V is 
empty. It returns the maximum degree recorded. This bound is always greater 
or equal to uj{G) — 1 and even x(G) — 1. Moreover, it is very fast to compute. 

3.3 Improving the MMD Bound 

Properties stated in [4] can be used to find better lower bounds for the treewidth. 
We use the concept of improved graph defined by Hans Bodlaender in his pa- 
per [4]. 



New Lower and Upper Bounds for Graph Treewidth 



73 



Algorithm 3.1 Maximum Minimum Degree Algorithm 

for V € V do 

M(v) := N(v) 
end for 
S--V 
MMD ■- 0 
while S' 7 ^ 0 do 

V* := argminves\M{v)\ 
if \M{v*)\ > MMD then 
MMD ■- \M{v*)\ 
end if 

for V G M(v*) do 
M{v) := M{v) - {n*} 

end for 
end while 



The Improved Graph with Common Neighbors. For a graph G = (U, E), 

let the neighbor improved graph G' = (V, E') of G be the graph obtained by 
adding an edge [u, u] to E for all pairs v,u €V such that v and u have at least 
fc + 1 common neighbors in G. 

Proposition 3.2. (See [4]) If the treewidth of G is at most k, then the treewidth 
of the neighbor improved graph G' of G is at most k. Moreover, any tree- 
decomposition of G with treewidth at most k is also a tree-decomposition of the 
neighbor improved graph with treewidth at most k, and vice-versa. 

We can use proposition 3.2 to modify the initial graph and to compute a bet- 
ter lower bound by using the following algorithm (Cf. algorithm 3.2). 

1. Assume that LB is a lower bound for the treewidth of G, i.e. tw{G) > LB. 

2. Let us suppose that tw{G) = LB. We add an edge between each pair of ver- 
tices which have LB -\- 1 common neighbors without modifying the treewidth 
of the graph. 

3. Compute a lower bound LB' for the treewidth of the improved graph G' . 

4. If LB' > LB, we have a contradiction. So, the initial assumption tw{G) = 
LB was false. 

5. We deduce LB < tw{G), so LB -I- 1 is also a lower bound. 

6. Repeat the process while there is a contradiction. 

When there is no more contradiction, we cannot deduce any more informa- 
tion. The bound used is MMD, as it returns good results within a very small 
computing time. We denote this algorithm LB_N. 



The Improved Graph and Vertex Disjoint Paths. The same technique can 
be applied to another improved graph. We now consider vertex disjoint paths 
instead of common neighbors. For a graph G = (V,E), let the paths improved 
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graph G" = {V, E") of G be the graph obtained by adding an edge u] to E 
for all pairs v,u G V such that there are at least fc + 1 vertex disjoint paths 
between v and u. 

Proposition 3.3. (See [4]) If the treewidth of G is at most k, then the 
treewidth of the paths improved graph G” of G is at most k. Moreover, any 
tree-deeomposition of G with treewidth at most k is also a tree-decomposition of 
the path improved graph with treewidth at most k, and vice-versa. 

To compute this bound, we have to create a new oriented graph D from G 
as follows. 

1. Each edge [v,u] € if is replaced by two arcs {v,u) and (u,v). 

2. Each vertex v is replaced by two vertices vi and V 2 

3. Each arc (x,v) is replaced by an arc (a;,?;!) 

4. Each arc (v,x) is replaced by an arc (v 2 ,x) 

5. An arc (^ 1 ,^ 2 ) with weight 1 is added. 

6. A weight +00 is associated to each arc added at steps 3 and 4. 

We add the edge [v,u] in G if the maximum flow f(v 2 ,ui) between the two 
corresponding vertices V 2 and ui in D is strictly greater than k. 

The computing time of this method is far larger, as we have to solve several 
network flow problems, but the results are improved dramatically because the 
edge set added is far larger than the previous one. We denote this algorithm 
LB_P. 



Algorithm 3.2 Improving the Lower Bound 



LB := lower J)ound{G) 
repeat 

LB -LB + 1 

G' := improve -graphic, LB) 
LB' := lower Jbound{G') 
until LB' = LB 
LB ~ LB-l 



4 A New Heuristic to Compute the Treewidth 

4.1 Triangulated Graphs 

Computing the treewidth of a triangulated graph is linear in time. So, many 
heuristics use the properties of triangulated graphs to find upper bounds for the 
treewidth. A graph G is triangulated if for every cycle of length /c > 3, there is 
a chord joining two non-consecutive vertices [16]. 
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Proposition 4.1. (See [9]) If G is triangulated, tw{G) = co{G) — 1. Moreover, 
ifG is triangulated, computing to (G) and thustw{G) has for complexity 0(n+m). 

So, given an arbitrary graph G, it is interesting to find a triangulated graph 
which contains G to get an upper bound for tw{G). The triangulated graph 
H = (V,E + Et), with E f] Et = %, is called a triangulation for G. From the 
following property, we know that the optimal value of tw{G) can be found using 
this technique: 

Proposition 4.2. (See [17]) For every graph G, there exists a triangulation H* 
such that tw{H*) = tw{G). Moreover, any tree-decomposition for G is a tree- 
decomposition for H* and vice-versa. 

Computing the treewidth of G is equivalent to finding such a graph H* (and 
also equivalent to finding the triangiilation with the minimum maximum clique 
size). This problem is NP-Hard as computing the treewidth is NP-Hard, but 
each triangulation for G gives an upper bound for tw{G). 

A vertex is said simplicial if its neighborhood is a clique [16]. Let G = (V, E) 
be a graph, an ordering (or elimination ordering) (t( 1, 2, . . . , n) of U is a perfect 
elimination ordering if and only if V* G [1, . . . ,n], a{i) is a simplicial vertex in 
G[{a(i ), . . . , <T(n)}], the subgraph induced by the higher ordered vertices. 

Proposition 4.3. (See [8]) G is triangulated if and only if it has a perfect 
elimination ordering. 

Let G = (y, E) be a non-triangulated graph and a be an elimination ordering. 
We denote H{a) = {V, E-\-Et{a)) the triangulation of G obtained by applying the 
following algorithm [17]: the vertices are eliminated in the elimination ordering 
tr. At each step i of the algorithm, the necessary edges to make v = a{i) to be 
a simplicial vertex are added to the current graph (i.e. an edge is added between 
each pair of neighbors of v). Then the vertex is deleted (Cf. algorithm 4.1). Let 
m' = \E -\- Et{a)\. Algorithm 4.1 can be implemented in 0{n + m') time [18]. 
We denote Nt{v) the set of neighbors of v in H. 

So, computing the treewidth can be solved this way: find an elimination or- 
dering which minimizes the maximum clique size of the triangulated graph ob- 
tained by algorithm 4.1. The treewidth is equal to maxy^ v\{u G Nt{v)/a ^(r>) < 
(t“^(m)}|, the maximum degree of a vertex when it is eliminated. 

4.2 The New Heuristic 

The algorithm min degree is a classical triangulation heuristic. The method 
starts from the initial graph G. At each step of the algorithm, the vertex v with 
minimum degree in the current graph is chosen. First, the edges necessary to 
make z; to be a simplicial vertex are added. Then v is deleted from the graph. 
The process is repeated for the remaining graph until the vertex set is empty. 
Locally, the choice of the vertex with minimum degree is suitable. Indeed, the 
size of the clique induced by the elimination of v is minimized, but this strategy 
is not optimal. 
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Algorithm 4.1 Graph Triangulation Associated to an Elimination Ordering a 

for V G V do 

Nt{v) := N{v) 

end for 

Et :=0 

for i := 1 to n do 

V := a{i) 

for u,z £ Nt{v) do 

Et := Et U [u,z\ {add edge to triangulation} 

Nt{u) := Nt{u) U {z} {update neighborhood} 

Nt{z) := Nt{z) U {«} 

end for 

for u £ Nt{v) do 

Nt{u) ;= Nt{u) \ {w} {w is eliminated} 

end for 
end for 



As the degree of the vertex eliminated is not sufficient, we add a more global 
criterion to improve the quality of the results. The idea is to compute a lower 
bound for the treewidth of the graph obtained after elimination of v. We want 
our algorithm to be a fast one, so we use the MMD bound [12]. 

Let Gy = (Vy,Ey) be the graph obtained after eliminating the vertex v. It is 
computed with the two following operations: 

• connect all current neighbors of in G 

• remove v from G 

As the treewidth of the remaining graph is larger than the bound computed, 
the global criterion has the larger weight. The vertex v* chosen is the one which 
minimizes the function 2 * lower Jjound{Gy) + |A((r;)|. We denote this algorithm 
D_LB (algorithm 4.2). 

5 Computational Analysis 

We have tested our methods on the DIMACS benchmarks for vertex coloring. For 
many of these graphs, w(G) is known, and the lower bounds are far from the best 
heuristic results. We compare our methods with the most recent ones (w(G) — 1 
and MMD [12] for the lower bound. Lex and MSVS [12] and the Tabu PEO [7] for 
the upper bound). Note that some values of io{G) are unknown. Our algorithms 
are implemented in C on a Pentium III IGHz. There is no randomization in our 
algorithms, so each method is launched one time. In comparison, the computer 
used for the tabu search is the same and the computation time of Lex and MSVS 
are obtained by Koster et al. with G++ implementation on a Pentium III 800 
Mhz. 

Our lower bounds can improve the results of the previous best bound (MMD) 
by a wide range. The results for the 62 instances are reported in Table 1. For 22 
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Algorithm 4.2 Greedy Algorithm for the Treewidth 

k 0 {maximum degree found} 

for V € V do 

Nt(v) := N(v) 

end for 

for i ~ 1 to n do 
for V € V do 
compute{Gv) 

end for 

V* := argminv^v{\Nt{v)\ + 2 * lower J>ound(Gv)) 
k = max{k, Nt{v)) {update width} 
for u,z £ Nt{v*) do 

Nt{u) := Nt{u) U {z} {update neighborhood} 

Nt lz) ■■= Nt{z) U {u} 

end for 

for u £ Nt{v*) do 

Nt{u) := Nt{u) \ {«*} {n* is eliminated} 

end for 
end for 



of them, the LBJM method improves the value of the lower bound. This value 
increases to 53 with the LB_P method. The computing times of our methods are 
larger, so they should not be used all along a branch & bound search method. 
They can be used at top levels, in order to cut a very large number of branches. 
Moreover, as some lower bounds reach the upper bounds, there is no need to 
launch an enumerative method for several instances (thanks to our method, the 
value of treewidth is known for 8 new benchmarks: FPSOL2.I.1, INITHX.I.l, 
MILES500, MULSOL.I.l, MULSOL.I.2, MULSOL.I.3, MULSOL.I.4, and ZE- 
ROINI.I.l). 

The results of the heuristic are in average as good as those returned by MS VS 
and Lex, but the computing times are far smaller. It is a method which finds good 
bounds using a very small computing time. Furthermore, it seems to be more 
stable than Lex, and never returns too bad results as Lex does for example with 
graphs INITHX (the values returned by Lex are more than six times greater than 
those returned by the three others methods). Our computational experiments 
for the upper bounds are reported in Table 2. 

6 Conclusion 

We have proposed new methods to compute lower and upper bounds for the 
treewidth of graphs. The improvement of the lower bound is very significant, 
and allows us to establish new exact values of treewidth for several graphs of the 
DIMACS benchmark for graph coloration. The heuristic is a really fast method 
to find good quality upper bounds. 
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The gap between the best upper bounds and lower bounds remains large. 
Indeed, the exact value of treewidth is not known for most of the benchmarks. 
So, to reduce the gap between the bounds, we have to improve the value of the 
lower bound again before working on an exact method. 
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Table 1. Lower bound 



instance 


n 


m 


ub 


lb 


CPU time 






Tabu 


o;(G) - 


1 MMD LB_N LB_P 


MMD LB_N 


LB_P 


anna 


138 


986 


12 


10 


10 


10 


11 


0.01 


0.01 


8.25 


david 


87 


812 


13 


10 


10 


10 


11 


0.00 


0.01 


7.08 


huck 


74 


602 


10 


10 


10 


10 


10 


0.00 


0 


1.09 


homer 


561 


3258 


31 


12 


12 


14 


21 


0.04 


2.19 


1202.78 


jean 


80 


508 


9 


9 


9 


9 


9 


0.00 


0 


2.76 


gamesl20 


120 


638 


33 


8 


8 


8 


12 


0.00 


0.77 


11.07 


QUEEN5.5 


25 


160 


18 


4 


12 


12 


12 


0.00 


0 


0.88 


QUEEN6.6 


36 


290 


25 


6 


15 


15 


15 


0.00 


0.05 


2.22 


QUEEN7.7 


49 


476 


35 


6 


18 


IS 


20 


0.01 


0.33 


1.57 


QUEEN8.8 


64 


728 


46 


8 


21 


21 


23 


0.00 


0.22 


6.50 


QUEEN9.9 


81 


1056 


58 


9 


24 


24 


26 


0.01 


0.06 


21.97 


QUEENIO.IO 


100 


1470 


72 


- 


27 


27 


31 


0.01 


0.16 


16.78 


QUEENll.il 


121 


1980 


88 


10 


30 


30 


34 


0.01 


0.33 


52.07 


QUEEN12.12 


144 


2596 


104 


- 


33 


33 


37 


0.02 


1.1 


144.16 


QUEEN13.13 


169 


3328 


122 


12 


36 


36 


42 


0.04 


1.05 


111.91 


QUEEN14.14 


196 


4186 


141 


- 


39 


39 


45 


0.07 


1.76 


283.24 


QUEEN15.15 


225 


5180 


163 


- 


42 


42 


48 


0.11 


3.19 


653.83 


QUEEN16.16 


256 


6320 


186 


- 


45 


45 


53 


0.21 


4.50 


519.60 


FPSOL2.I.1 


269 


11654 


66 


64 


64 


66 


66 


1.03 


0.44 


1003.86 


FPSOL2.I.2 


363 


8691 


31 


29 


31 


31 


31 


0.51 


0.38 


3295.79 


FPSOL2.I.3 


363 


8688 


31 


29 


31 


31 


31 


0.67 


0.39 


2893.60 


INITHX. I.l 


519 


18707 


56 


53 


55 


56 


56 


5.83 


0.88 


23995.88 


INITHX. 1.2 


558 


13979 


35 


30 


31 


31 


31 


2.37 


3.62 


67708.76 


INITHX. 1.3 


559 


13969 


35 


30 


31 


31 


31 


2.48 


4.01 


54247.00 


MILESIOOO 


128 


3216 


49 


41 


41 


44 


48 


0.07 


0.66 


268.73 


MILES1500 


128 


5198 


77 


72 


72 


76 


76 


0.10 


0.06 


117.10 


MILES250 


125 


387 


9 


7 


7 


8 


8 


0.00 


0.05 


26.80 


MILES500 


128 


1170 


22 


19 


19 


21 


22 


0.01 


0 


42.44 


MILES750 


128 


2113 


36 


30 


31 


32 


33 


0.02 


0.06 


438.35 


MULSOL.I.l 


138 


3925 


50 


48 


48 


50 


50 


0.07 


0.22 


302.88 


MULSOL.I.2 


173 


3885 


32 


30 


31 


32 


32 


0.09 


0.11 


256.44 


MULSOL.I.3 


174 


3916 


32 


30 


31 


32 


32 


0.09 


0 


250.10 


MULSOL.I.4 


175 


3946 


32 


30 


31 


32 


32 


0.09 


0.06 


252.96 


MULSOL.I.5 


176 


3973 


31 


30 


31 


31 


31 


0.09 


0 


363.64 


MYCIEL3 


11 


20 


5 


3 


3 


3 


4 


0.00 


0 


0 


MYCIEL4 


23 


71 


10 


4 


5 


5 


6 


0.00 


0 


0.48 


MYCIEL5 


47 


236 


19 


5 


8 


8 


12 


0.00 


0.33 


0.52 


MYCIEL6 


95 


755 


35 


6 


12 


15 


20 


0.01 


0.33 


6.98 


MYCIEL7 


191 


2360 


66 


7 


18 


25 


34 


0.03 


0.77 


98.38 


SCHOOLl 


385 


19095 


188 


- 


73 


88 


116 


4.79 


62.4 


5960.29 


SCHOOL1.NSH 


352 


14612 


162 


- 


61 


72 


100 


3.08 


32.12 


4142.04 


ZEROIN.I.l 


126 


4100 


50 


48 


48 


50 


50 


0.17 


0.11 


53.82 


ZEROIN.I.2 


157 


3541 


32 


29 


29 


31 


31 


0.09 


0.28 


290.09 


ZEROIN.I.3 


157 


3540 


32 


29 


29 


31 


31 


0.07 


0.11 


283.54 


LE450.5A 


450 


5714 


256 


4 


17 


17 


33 


0.35 


28.4 


686.56 


LE450_5B 


450 


5734 


254 


4 


17 


17 


33 


0.46 


29.38 


608.17 


LE450.5C 


450 


9803 


272 


4 


33 


33 


51 


1.38 


38.77 


1588.40 


LE450_5D 


450 


9757 


278 


4 


32 


32 


51 


2.30 


40.37 


34947.64 


LE450.15A 


450 


8168 


272 


14 


24 


24 


56 


0.39 


35.87 


1465.50 


LE450_15B 


450 


8169 


270 


14 


24 


24 


55 


0.80 


36.75 


1678.40 


LE450.15C 


450 


16680 


359 


14 


49 


49 


92 


7.58 


79.43 


5558.45 


LE450_15D 


450 


16750 


360 


14 


51 


51 


91 


8.76 


77.94 


5422.94 


LE450.25A 


450 


8260 


234 


24 


26 


27 


62 


1.31 


36.41 


1723.47 


LE450_25B 


450 


8263 


233 


24 


25 


25 


59 


1.20 


33.39 


1692.10 


LE450.25C 


450 


17343 


327 


24 


52 


52 


100 


8.13 


74.48 


5893.32 


LE450_25D 


450 


17425 


336 


24 


51 


51 


98 


5.31 


79.59 


5635.75 


DSJC125.1 


125 


736 


66 


- 


8 


8 


16 


0.01 


0.83 


6.28 


DSJC125.5 


125 


3891 


109 


- 


53 


53 


62 


0.07 


0.76 


105.36 


DSJC125.9 


125 


6961 


119 


- 


103 


107 


108 


0.20 


0.28 


77.51 


DSJC250.1 


250 


3218 


173 


- 


IS 


18 


32 


0.09 


5.16 


131.33 


DSJC250.5 


250 


15668 


232 


- 


109 


109 


125 


3.62 


11.31 


3021.03 


DSJC250.9 


250 


27897 


243 


- 


211 


213 


218 


8.20 


6.31 


1905.30 
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Table 2. Upper bound 



instance 


n 


m 


lb 


ub 


CPU time 






LB.P 


LEX MSVS D.LB Tabu 


LEX 


MSVS 


D_LB 


Tabu 


anna 


138 


986 


11 


12 


12 


12 


12 


1.24 


18.39 


0.880 


2776.93 


david 


87 


812 


11 


13 


13 


13 


13 


0.56 


7.77 


0.220 


796.81 


huck 


74 


602 


10 


10 


10 


10 


10 


0.24 


2.30 


0.130 


488.76 


homer 


561 


3258 


21 


37 


31 


32 


31 


68.08 


556.82 


27.380 


157716.56 


jean 


80 


508 


9 


9 


9 


9 


9 


0.29 


1.98 


0.130 


513.76 


gamesl20 


120 


638 


12 


37 


51 


41 


33 


5.20 


65.97 


1.620 


2372.71 


QUEEN5.5 


25 


160 


12 


IS 


18 


18 


18 


0.04 


0.22 


0.340 


100.36 


QUEEN6.6 


36 


290 


15 


26 


28 


27 


25 


0.16 


1.16 


0.140 


225.55 


QUEEN7.7 


49 


476 


20 


35 


38 


38 


35 


0.51 


4.66 


0.090 


322.40 


QUEEN8.8 


64 


728 


23 


46 


49 


50 


46 


1.49 


16.38 


0.350 


617.57 


QUEEN9.9 


81 


1056 


26 


59 


66 


64 


58 


3.91 


47.35 


0.740 


1527.13 


QUEENIO.IO 


100 


1470 


31 


73 


79 


80 


72 


9.97 


128.30 


1.670 


3532.78 


QUEENll.il 


121 


1980 


34 


89 


101 


102 


88 


23.36 


310.83 


3.160 


5395.74 


QUEEN12.12 


144 


2596 


37 


106 


120 


117 


104 


49.93 


702.29 


6.720 


10345.14 


QUEEN13.13 


169 


3328 


42 


125 


145 


141 


122 


107.62 


1589.77 


10.940 


16769.58 


QUEEN14.14 


196 


4186 


45 


145 


164 


164 


141 


215.36 


3275.75 


20.300 


29479.91 


QUEEN15.15 


225 


5180 


48 


167 


192 


194 


163 


416.25 


6002.33 


31.070 


47856.25 


QUEEN16.16 


256 


6320 


53 


191 


214 


212 


186 


773.09 


11783.30 


63.890 


73373.12 


FPSOL2.I.1 


269 


11654 


66 


66 


66 


66 


66 


319.34 


4220.91 


176.110 


63050.58 


FPSOL2.I.2 


363 


8691 


31 


52 


31 


31 


31 


622.22 


8068.88 


174.930 


78770.05 


FPSOL2.I.3 


363 


8688 


31 


52 


31 


31 


31 


321.89 


8131.78 


144.650 


79132.70 


INITHX. I.l 


519 


18707 


56 


223 


56 


56 


56 


3144.95 


37455.10 


2966.020 


101007.52 


INITHX. 1.2 


558 


13979 


31 


228 


35 


35 


35 


5567.96 


37437.20 


1004.340 


121353.69 


INITHX. 1.3 


559 


13969 


31 


228 


35 


35 


35 


5190.39 


36566.80 


884.430 


119080.85 


MILESIOOO 


128 


3216 


48 


49 


53 


53 


49 


14.39 


229.00 


3.420 


5696.73 


MILES1500 


128 


5198 


76 


77 


S3 


77 


77 


29.12 


268.19 


3.470 


6290.44 


MILES250 


125 


387 


8 


10 


9 


9 


9 


1.12 


10.62 


0.350 


1898.29 


MILES500 


128 


1170 


22 


22 


28 


28 


22 


4.37 


87.18 


0.960 


4659.31 


MILES750 


128 


2113 


33 


37 


38 


43 


36 


8.13 


136.69 


1.850 


3585.68 


MULSOL.I.l 


138 


3925 


50 


66 


50 


50 


50 


17.77 


240.24 


12.700 


3226.77 


MULSOL.I.2 


173 


3885 


32 


69 


32 


32 


32 


34.06 


508.71 


15.290 


12310.37 


MULSOL.I.3 


174 


3916 


32 


69 


32 


32 


32 


34.58 


527.89 


14.010 


9201.45 


MULSOL.I.4 


175 


3946 


32 


69 


32 


32 


32 


35.53 


535.72 


14.100 


8040.28 


MULSOL.I.5 


176 


3973 


31 


69 


31 


31 


31 


36.25 


549.55 


12.920 


13014.81 


MYCIEL3 


11 


20 


4 


5 


5 


5 


5 


0.00 


0.01 


0.000 


72.50 


MYCIEL4 


23 


71 


6 


11 


11 


11 


10 


0.02 


0.13 


0.310 


84.31 


MYCIEL5 


47 


236 


12 


23 


20 


20 


19 


0.28 


2.00 


0.300 


211.73 


MYCIEL6 


95 


755 


20 


47 


35 


35 


35 


4.56 


29.83 


2.410 


1992.42 


MYCIEL7 


191 


2360 


31 


94 


74 


70 


66 


109.86 


634.32 


28.640 


19924.58 


SCHOOLl 


385 


19095 


116 


252 


244 


242 


188 


3987.64 


41141.10 


273.620 


137966.73 


SCHOOL1.NSH 


352 


14612 


100 


192 


214 


200 


162 


2059.52 


28954.90 


161.700 


180300.10 


ZEROIN.I.l 


126 


4100 


50 


50 


50 


50 


50 


17.78 


338.26 


10.680 


2595.92 


ZEROIN.I.2 


157 


3541 


31 


40 


33 


33 


32 


24.82 


448.74 


26.760 


4825.51 


ZEROIN.I.3 


157 


3540 


31 


40 


33 


33 


32 


24.69 


437.06 


24.780 


8898.80 


LE450.5A 


450 


5714 


33 


310 


317 


323 


256 


7836.99 


73239.66 


274.490 


130096.77 


LE450_5B 


450 


5734 


33 


313 


320 


321 


254 


7909.11 


73644.28 


260.290 


187405.33 


LE450.5C 


450 


9803 


51 


348 


340 


329 


272 


10745.70 


103637.17 


525.350 


182102.37 


LE450_5D 


450 


9757 


51 


349 


326 


318 


278 


10681.29 


96227.40 


566.610 


182275.69 


LE450.15A 


450 


8168 


56 


296 


297 


300 


272 


6887.15 


59277.90 


273.700 


117042.59 


LE450_15B 


450 


8169 


55 


296 


307 


305 


270 


6886.84 


65173.20 


230.900 


197527.14 


LE450.15C 


450 


16680 


92 


379 


376 


379 


359 


12471.09 


122069.00 


356.610 


143451.73 


LE450_15D 


450 


16750 


91 


379 


375 


380 


360 


12481.22 


127602.00 


410.350 


117990.30 


LE450.25A 


450 


8260 


62 


255 


270 


267 


234 


4478.30 


53076.40 


243.290 


143963.41 


LE450_25B 


450 


8263 


59 


251 


264 


266 


233 


4869.97 


52890.00 


248.610 


184165.21 


LE450.25C 


450 


17343 


100 


355 


365 


361 


327 


10998.68 


109141.00 


344.360 


151719.58 


LE450_25D 


450 


17425 


98 


356 


359 


362 


336 


11376.02 


111432.25 


434.120 


189175.40 


DSJC125.1 


125 


736 


16 


70 


67 


67 


66 


12.90 


171.54 


2.500 


1532.93 


DSJC125.5 


125 


3891 


62 


110 


110 


110 


109 


38.07 


254.90 


3.870 


2509.97 


DSJC125.9 


125 


6961 


108 


119 


120 


120 


119 


55.60 


70.79 


56.630 


1623.44 


DSJC250.1 


250 


3218 


32 


183 


179 


176 


173 


528.10 


5507.86 


32.730 


28606.12 


DSJC250.5 


250 


15668 


125 


233 


233 


233 


232 


1111.66 


7756.38 


48.510 


14743.35 


DSJC250.9 


250 


27897 


218 


243 


244 


244 


243 


1414.58 


1684.83 


15.600 


30167.70 
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Abstract. We consider skewed distributions of strings, in which any 
two such strings share a common prefix much longer than that expected 
in uniformly distributed (random) strings. For instance, this is the case 
of URL addresses, IP addresses, or XML path strings, all representing 
paths in some hierarchical order. As strings sharing a portion of the path 
have a quite long common prefix, we need to avoid the time-consuming 
repeated examination of these common prehxes while handling the linked 
data structures storing them. For this purpose, we show how to imple- 
ment search data structures that can operate on strings with long prefixes 
in common. Despite the simplicity and the generality of the method, our 
experimental study shows that it is quite competitive with several opti- 
mized and tuned implementations currently available in the literature. 



1 Introduction 

In many applications keys are arbitrarily long, such as strings, multidimensional 
points, multiple-precision numbers, or multi-key data, and are modelled as k- 
dimensional keys for a given positive integer fc > 1, or as variable- length keys. 
The latter can be virtually padded with a sufficient number of string terminators 
so that they can be considered having all the same length k. When dealing with 
skewed distributions (shortly, skewed strings), such as URL addresses, IP ad- 
dresses, or XML path strings representing paths in some hierarchical order, we 
can observe that they share typically long prefixes. It is more realistic to assume 
that the average length of the common prefix of any two such strings is much 
longer than that expected in the case of uniformly distributed (random) strings. 

A reasonable measure of “skewness,” denoted -Sf'(S'), for a given set S of n 
strings xi,. . . ,x„ over an alphabet U can be formalized as follows. Let mi be 
the longest matched prefix of Xi against the previous strings x\, , Xi-\. Define 
m = X^r =2 1) as the average length of matched prefixes in S. Note that 
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the definition of m does not depend on the order of the keys in S, as (m +1) 
is equivalent to the internal path length of the 1 11'1-ary trie built on S', which is 
independent of the order of insertions. We use the fact that {m + 1) « log|^| n 
when the strings in S are independent and identically distributed [31, Chap. 4]. 
Hence, we measure the skewness (or non-randomness) of the strings in S by 
the ratio -S?(S) = {m + l)/log|j;| n. In a certain sense, a large value of ra is 
a good indicator of the skewness of the strings, which is normalized by that 
same indicator for random strings. Alternatively, since log|^|n is proportional 
to the height of a random |i7|-ary tree with n nodes [31, Chap.4[, we can see 
jSf(S') as the average number of characters matched per level in a random trie. 
The intuition is that ^{S) is small in random strings, and the dominant cost is 
given by traversing the access path from the root to a node. For skewed strings, 
^{S) is large and the dominant cost is due to scanning the long prefixes of the 
strings. 

Fast searching on a dynamic set S of skewed strings can be supported by 
choosing from a vast repertoire of basic data structures, such as AVL-trees [2] , 
red-black trees [4, 14, 32], (a, &) -trees [16], weight-balanced BB[a]-trees [22], 
finger search trees, self-adjusting trees [30], and random search trees [27], just 
to name a few. Many of these data structures exhibit interesting combinatorial 
properties that make them attractive both from the theoretical and from the 
practical point of view. They are defined on an ordered set of (one-dimensional or 
one-character) keys, so that searching is driven by comparisons against the keys 
stored in their nodes: it is usually assumed that any two keys can be compared in 
0(1) time. Their operations can be extended to long keys by performing string 
comparisons while routing the keys for searching and updating. This approach 
may work well in the case of uniformly distributed strings, since a mismatching 
character is found in expected constant time. However, it is not satisfactory 
for skewed strings, since a string comparison may take as much as 0(fc) time 
per operation, thus slowing down the performance of carefully designed data 
structures by a multiplicative 0{k) factor. To improve on this simple-minded 
approach, we would like to avoid repeated examinations of long common prefixes 
while handling the linked data structures storing them efficiently. 

Several approaches for solving this problem are possible, such as using tries 
(see e.g. [17]) or ad hoc data structures for fc-dimensional keys, for which we 
have many examples [5, 6, 8, 10, 11, 13, 19, 30, 33, 34, 35]. The height of all 
these data structures is 0{k + logn), where k is the maximum key length and n 
the total number of keys. A more general paradigm, described in [12], makes 
use of algorithmic techniques capable of augmenting many kinds of (heteroge- 
neous) linked data structures so that they can operate on fc-dimensional keys. 
Consequently, the richness of results on data structures originally designed for 
one-dimensional keys is available for fc-dimensional keys in a smooth, simple and 
efficient way, without incurring in the previously mentioned 0(fc)-slowdown. 

A potential drawback of general techniques, such as the ones in [12], is that 
they may produce data structures that are difficult to use in real-life applica- 
tions, which could take some advantages from their efficient implementation for 
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skewed strings. Just to name one, Internet “interceptor” software blocks access 
to undesirable Internet sites, storing huge lists of URLs to be blocked, and check- 
ing by prefix searches in these URLs. Other applications may be in networking 
for filtering purposes and in databases for storing and searching XML records. 
Note that hashing is of little help in these situations as it cannot perform the 
search of prefixes of the keys, being scattered throughout the memory. Instead, 
the above data structures are more powerful than hashing, since they allow for 
prefix and one-dimensional range searching and for sorting keys. 

In this paper, we pursue an algorithm engineering avenue to show that the 
techniques in [12] may be of practical value. We therefore discuss the implement- 
ing issues from the programmers’ perspective. In particular, we show that the 
general transformation from one-character to /c-character keys can be carried out 
by following a rather general and simple scheme, so that a given implementation 
of a data structure can be easily modified in order to handle fc-character keys 
without incurring in the 0(A:)-slowdown. In principle, we can start from several 
basic data structures for one-character keys, and show how to make them work 
for fc-character keys. This can be easily done as the technique itself is not inva- 
sive: if we start from a given data structure (e.g., a binary search tree, an AVL 
tree, an (a, 6)-tree, or a skip list) or a related piece of code for one-character 
keys, we can produce a data structure for fc-character keys which retains exactly 
the same topology (and the same structural properties) as the original structure. 

We start out from the techniques in [12] for re-engineering the core of the 
search and update algorithms, and we compare the resulting codes to more 
optimized and tuned algorithms. In our experiments we consider the Patricia 
tries [17, 21], the ternary search trees [5] and a cache-aware variant of tries [1], 
which are known to be among the fastest algorithms for long keys. Timings were 
normalized with the average time of linear probing hashing [17]. Our experi- 
ments were able to locate two different thresholds on our data sets, based on 
the parameter =Sf(5') for skewness. When the parameter is small, the adaptive 
tries of [1] give the fastest implementation. When it is large, our techniques de- 
liver the fastest codes, which thus become relevant in the case of skewed strings. 
For intermediate values of ^{S), the experimented algorithms have very close 
behavior: this is the range where the ternary search trees of [5] become very 
competitive. In any case, the fastest algorithms obtain a speedup factor that 
is bounded by 2 or 3. As a matter of fact this is not a limitation, since the 
data structures are highly tuned for strings and our general technique compares 
favorably in several cases. This may allow for treating 2-3 times more queries 
per time unit on a high-performance server, which is effective in practice. Our 
data structures are therefore a valid alternative to ubiquitous Patricia tries and 
compact tries in all their applications (indexing, sorting, compressing, etc. etc.). 

2 Engineering the Algorithms for Long Keys and Strings 

Our technique can be illustrated by running an example on the classical lookup 
procedure adopted in binary search trees. The reader can follow our example 
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and apply the same methodology to other data structures fulfilling the require- 
ments described in [12]. Namely, they must be linked data structures whose keys 
undergo a total order and whose search and update operations are driven by 
comparison of the keys (i.e., without hashing or bit manipulation of the keys, 
for which our technique does not work). With this approach, we have imple- 
mented the ANSI C code of several multi-character key data structures, starting 
from available code for binary search trees, treaps, AVL trees [2], (a, 6)-trees [16], 
and skiplists [24]. In particular, for binary search trees, treaps, and AVL trees 
we started from the implementation described in [36], for skiplists we started 
from the implementation described in [24], which is available via anonymous ftp 
at [25], while for (a, b)-trees we started from our own implementation of [16, 20]. 

2.1 Preliminaries 

In our description we identify the nodes with their stored keys, and assume that 
the keys fulfill a total order with two special keys — oo and -l-oo that are always 
the smallest and the largest ones. Given a node t, let tt be the access path from 
the root to t. We define the successor -k^ of t along path tt to be the smallest 
ancestor that is greater than t (i.e., we find tt )*" by going upward in tt until we 
cross a left link). The predecessor is defined analogously: namely, tt^ is the 
greatest ancestor that is smaller than t. Note that both tt^ and tt^ are well de- 
fined because of the special keys. We store in each node t the maximum number 
of initial characters that t shares with irf' and ttj”, that is, their longest common 
prefix length or, shortly, Icp. As it will be clear, the Icp of two strings permits 
to compare them in 0(1) time by simply comparing their leftmost mismatch- 
ing symbols, which occur in position Icp (we follow the convention of numbering 
string positions starting from 0). We assume that the keys are chosen from a uni- 
verse keytype, such as a sequence of integers, characters, or reals, terminated 
by the special null value 0. 

An example of binary search tree and its augmented version with long keys is 
shown in Figure 1. In the example, the first number in each node is the value of 
the Icp with its predecessor while the second number is the value of the Icp with 
its successor. For example, seafood has seacoast as predecessor and surf as 
successor: hence, the two Icp values are 3 and 1, respectively. 

2.2 The Case Study of Binary Search Trees 

We begin by taking an available implementation of binary search trees, such 
as the one described in [36], where each node has the regular fields key, left 
and right. We may extend the functionalities to strings by using the C library 
function strcmp to perform comparisons of keys. Our first step is instead that of 
obtaining a preliminary version of the search procedure, called FindA, by aug- 
menting each node with two fields pred_lcp and succ_lcp storing the longest 
common prefix of the strings pointed by the fields key of node t and its predeces- 
sor nf , and that of node t and its successor , respectively. We do not need to 
store pointers to irf and tt^. Then, we replace strcmp with function f ast_scmp, 
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which is aware of previous comparisons of single characters. This function is at 
the heart of our computation as it compares efficiently a search key x against 
the key in a node t. Both keys have the same predecessor and successor with 
respect to the access path from the root to t (excluded) , whereas their Icps may 
be different. Function f ast_scmp makes use of three global static int variables: 

~ m denotes the number of characters matched in the search key x so far; 

— left_best is the Icp between x and its current predecessor; 

— right_best is the Icp between x and its current successor. 

The initial value of the variables is 0. At the generic comparison step, m characters 
have been matched, and at least one of left_best and right_best equals m. 
To compare x and t’s key, we first set a local variable Icp to the proper Icp 
between the key in node t and either its predecessor or its successor. Namely, 
we select one of the fields pred_lcp and succ_lcp in the node, driven by the 
invariant that at least one of left_best and right_best equals m. If Icp is at 
least m in value, we may have to extend the m matched characters by comparing 
one character at a time, starting from the character in position m of both keys. In 
any case, we end up storing in Icp the Icp between x and t’s key. At this point, 
the mismatch between the characters in position Icp of both keys yields the 
outcome of fast_scmp. We do not comment further the source code of function 
fast_scmp here as it is the mere implementation of the ideas in [12]: 

int fast_scmp ( keytype x, node t ) ■[ 

int Icp = ( left_best == m ) ? t->pred_lcp : t->succ_lcp; 
if ( m <= Icp ) { 

for ( ; (x[m] != 0) && (x[m] == t->key [m] ) ; m++ ) ; 

Icp = m; 

} 

if ( xflcp] <= t->key[lcp] ) 
right_best = Icp; 

else 

left_best = Icp; 

return ( xflcp] - t->key[lcp] ); I 

As previously mentioned, once Icp is known, the comparison among x and 
t->key is trivially done in the last if statement. Although satisfactory from 
a theoretical point of view, fast_scmp is not always better than using strcmp; 
however, it can be reused for other kinds of data structures by introducing min- 
imal changes in their original source codes. 

2.3 Single-Shot Version 

The resulting code of FindA and fast_scmp in Section 2.2 can be tailored 
to achieve greater efficiency, writing it in a single and compact form, called 
FindB and reported below. There, static variables become local variables. At 
each iteration in the main while loop, we perform the computation of Icp, 
originally in fast_scmp. Next, we suitably combine the comparison of the mis- 
matching symbols at position Icp and the subsequent node branching to prepare 
for the next iteration. 
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node Finds ( keytype x, node t ) •[ 

int m = 0, left_best = 0, right_best = 0, Icp = 0; 
while ( t ) ■[ 

Icp = (left_best == m) ? t->pred_lcp : t->succ_lcp; 
if ( m <= Icp ) { 

for ( ; (x[m] != 0) && (x[m] == t->key [m] ) ; m++) ; 

Icp = m; }■ 
if ( x[lcp] == 0 ) { 
return t ; 

} else if ( x[lcp] < t->key[lcp] ) { 
t = t->left; right_best = Icp; 

} else {. 

t = t->right; left_best = Icp; ]■ }■ 
return t ; ]■ 



2.4 Code Tuning and Faster Icp Computation 

By profiling the code in FindB, we discovered that one potential bottleneck of our 
implementation was the Icp computation. In particular, line profiling revealed 
us that we had to infer the value of Icp computed by line 

Icp = ( left_best == m ) ? t->pred_lcp : t->succ_lcp. 

We remark that the purpose of the above line is to store temporarily in Icp the 
Icp between t’s key and its predecessor (resp., successor) when x matches the 
first m characters of that predecessor (resp., successor). We wish to avoid a direct 
computation. Hence, let us assume that we have that value of Icp as a conse- 
quence of some inductive computation, and that Icp is 0 initially. We unroll the 
while loop of function FindB and run its iterations as previously done. Then we 
restate each iteration according to a new scheme driven by two cases formalized 
in function FindC below, in which x and t are nonempty: 

node FindC ( keytype x, node t ) { 

int m = 0, Icp = 0, nextleft = 0; 
while ( 1 ) { 

if (m <= Icp) i 1 1 CASE 1 

for ( ; *x == t->key[m] ; m++ ) 
if ( !*x++ ) return t; 
if ( *x < t->key [m] ) •[ 

if ( ! (t = t->left) ) return NULL; 

Icp = t->succ_lcp; nextleft = 0; 

} else ■[ 

if ( ! (t = t->right) ) return NULL; 

Icp = t->pred_lcp; nextleft = 1 ; I 
} else { // CASE 2 

if ( nextleft ) { 

if ( ! (t = t->left) ) return NULL; 

Icp = t->pred_lcp; 

} else { 

if ( ! (t = t->right) ) return NULL; 

Icp = t->succ_lcp; IF}}- 

1. Case m < Icp. ^ We compare the characters in x and t->key as before (the 
inner for loop). After that, m (and Icp) stores their Icp. We branch accord- 
ingly with the second if statement met in the iteration. Going left, the key 

^ Profiling and experiments show that splitting case 1 into two further cases < and = 
does not pay off. 
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in the current node will become the successor of x matching its first m charac- 
ters. So, Icp will equal t->succ_lcp. Analogously, going right, the current 
key will become the predecessor of x matching its first m characters and 
Icp will equal t->pred_lcp. This keeps the induction on Icp for the next 
while iteration. 

2. Case m > Icp. We do not enter the first if condition as Icp already stores the 
Icp between x and t->key. Then, we branch with the second if statement 
as in case 1. An important difference with case 1 is that, here, we have 
the opposite situation to keep the induction on Icp. Indeed, the key in the 
current node is not matching the first m characters of x. Hence, going left, we 
surely know that the predecessor of x matches its first m characters rather 
than the current key (the successor). So, Icp will equal t->pred_lcp. Going 
right, Icp will equal t->succ_lcp. 

In order to better optimize the algorithm behind FindC, we avoid a three-way 
character comparison in case 2 by introducing a variable nextleft that stores 
the negation of the outcome of the comparison in the least recent execution of 
case 1. In other words, if we branched left the last time we executed case 1 (so, 
nextleft is 0), we have to go right in the following executions of case 2, until 
another execution of case 1 comes into play. We have an analogous situation for 
the right branch (nextleft is 1). Note that we do not need anymore variables 
left_best and right_best. 

2.5 Simplification and Re-engineering of the Final Version 

Although being optimized, the code of function FindC looks rather cryptic. For 
this reason, we completely restructure it by eliminating the need for the local 
variables, except m, and by running the main while loop in classical three-way 
branching of binary search trees. Namely, we divide the top level search in the 
three standard cases for tree searching [<,=,>], rather than cases 1-2 of Sec- 
tion 2.4, and design a new while loop. Even if variable Icp disappears, we refer 
to cases 1-2 in equivalent terms. We say that case 1 holds if the key in current 
node matches the first m characters of x or more; we say that case 2 holds if the 
key matches strictly less than m characters of x. 

We now describe the new search function shown in Figure 1. Let us as- 
sume that initially case 1 holds with the m-th character of the key. As the new 
while loop goes on, we keep the invariant that case 1 holds at the beginning of 
each iteration. That is, the key x shares at least m initial characters with t it- 
self (and no ancestor can share more). Then, we restate cases 1-2 in terms of 
the classical three-way branching, according to the outcome of the comparison 
between the characters in position m of x and t->key: 

[<] We branch to the left child. We start an inner while loop that branches 
rightward as long as case 2 holds. That is exactly what the more complicated 
procedure FindC does. We exit from the inner while loop when case 1 holds 
again, so that we can start a new iteration of the main while loop. 
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[>] We handle this case analogously to the previous one, except that we branch 
to the right child and start an inner while loop branching leftward. 

[=] We extend the match for equality as done in function FindC. At that point, 
either we find a whole match and return, or a mismatch and case 1 holds. 

For example, in the tree of Figure 1, suppose that we want to search for x = 
seashore. We start from the root seacoast with m = 0 and compare the charac- 
ters in position 0, which are equal. We therefore are in the third case, where we 
match m = 3 characters (i.e., the prefix sea). We then run another iteration of 
the main while loop, which leads to the second case. We apply goright reach- 
ing surf and, since its pred_lcp is smaller than m, we apply goleft reaching 
seafood. At this point, we compare the characters in position m and then apply 
goright reaching seaside. We run another iteration of the main while loop 
matching one further character, so that we have m = 4 (i.e., the prefix seas). 
Finally, we goleft and find a NULL pointer, completing the search with a fail- 
ure. Searching for x = seaside follows the same path, except for matching all 
characters, thus completing the search with a success. 

2.6 Properties and Extensions 

The code can be easily modified to handle prefix searching, namely, to check 
whether a; is a prefix of one of the stored keys or to compute the longest matching 
prefix of x. That operation is useful, for example, in text indexing for performing 
full text searching and in hierarchical path strings to find a common path of two 
URLs. Despite its simplicity, the code is very efficient. Since the length of the 
access path in the tree is upper bounded by the height h of the tree in the worst 
case, the code reduces the cost of searching key x from 0{k ■ h) to 0{k + h) in 
the worst case. To see why, we observe that each match increases the counter m 
and each mismatch causes traversing one or more nodes in the access path. 




#define goleft (t) if ( ! (t«t->left) ) return NULL 
•define goright(t) if ( ! (t®t->right) ) return NULL 

node FindC keytype x, node t ) { 
int m * 0; 
while C 1 ) *( 

if ( x[m] < t->key[m] ) { 
goleft (t) : 

while ( t->8ucc_lcp < m ) goright(t); 

} else if ( x[m] > t->key[m] ) { 
goright (t) ; 

while ( t->pred_lcp < m ) goleft(t); 

if ( x[m++] == *\0* ) rettirn t; 
while ( x[m] == t->key[m] ) 

if ( x[m++] ** *\0’ ) return t; }})• 



Fig. 1. An augmented binary search tree for strings and its search function 
Find 
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Fact 1 Applying procedure Find for a seareh key x of length k in a root-to-node 
access path of length h of a linked data structure requires m successful com- 
parisons of single characters and at most h unsuccessful comparisons of single 
charaeters, where m < k is the length of the longest matching prefix of x. 

As previously remarked, the worst situation is for skewed strings, in which the 
value of m can be frequently much larger than the expected value of 0 (log| 2 ;| n), 
typical of independent and identically distributed strings [31, Chap. 4]. Since the 
average height of a tree is O(logn), we expect that procedure Find has a cost 
of 0{k + logn) time for n keys. What if we use the standard strcmp routine of 
the C library instead? While this may work reasonably well for random strings, 
the theory suggests that our techniques can be competitive for skewed strings, 
since strcmp does not exploit previously made comparisons of single characters. 

We note that performing insertions is not difficult in this contest. When 
creating a new node s storing the search key x, we must have followed a certain 
root-to-node access path tt. A better look at the code of Find reveals that we 
can also compute the values of pred_lcp and succ_lcp for s. We need two extra 
variables for this purpose, initialized to 0. In the last else-branch in Figure 1, 
after extending a match, we have that m is the length of the longest common 
prefix between x and the key in the current node. The character in position m 
can tell if we will reach either the left child or the right child of the current node 
in 7T (the next iteration of the main while loop). In the former case, the current 
node is the best candidate for being the successor of s, and so we record m to be 
possibly stored in field succ_lcp of s; in the latter case, the current node is the 
best candidate for being the predecessor of s, and so we record m to be possibly 
stored in field pred_lcp of s. When s is actually created at the end of tt, the 
best candidates are employed as the predecessor and the successor of s. Having 
computed the values of fields pred_lcp and succ_lcp, we proceed by inserting s 
according to what is required by the data structure at hand (e.g., restructuring 
it). If needed, the lep values can be suitably updated in 0(1) time per node as 
discussed in [12]. 

3 The Experimental Setup 

In this section we describe the experiments performed. We first briefly sketch the 
techniques on which we experimented. Next we describe the data sets used, and 
finally describe the experimental results. We ran our codes on several computing 
platforms, but we will present only the results relative to a processor AMD 
Athlon (IGhz clock and 512Mb RAM) running Linux 2.4.0, since similar results 
have been obtained on the other platforms. 

3.1 The Data Structures 

We developed the ANSI C code for binary search trees [17], treaps [27], AVL 
trees [2], (a, 6)-trees [16], and skiplists [24]. In particular, for binary search trees. 
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Table 1. The number of nodes and the height of the data structures considered 
in the experiments for a set S of n strings. The bounds are also in terms of m, 
the average length of matched prefixes in the formula for the skewness, (m+1) = 
J§f(5') logj^i n (see Section 1). The starred bounds hold on the average 





cat 


tst 


pat 


bst 


avl 


#nodes 


0{n(m + 1)) 


0(^n{m + 1)) 


0(n) 


0(n) 


0(n) 


height 


0{m + l)* 


0(logn -1- (to + 1))* 


o{m + iy 


O(logn)* 


0(log n) 



treaps, and AVL trees we started from the implementation described in [36], for 
skiplists we started from the implementation described in [24], which is avail- 
able via anonymous ftp at [25], while for (a, 6) -trees we started from our own 
implementation of [16, 20]. We used function fast_scmp in the same way as de- 
scribed in Section 2.2, leaving the tuning to a later stage. We extensively tested 
these five implementations on several data sets and computing platforms. We 
omit here the results of these preliminary tests and we only mention that the 
computational overhead is very small, validating the prediction of the theory 
as the key length k goes to infinity. At this stage, however, the resulting data 
structures were not always competitive with the original data structures using 
the standard strcmp, especially for small/intermediate key lengths. The purpose 
of these experiments was identifying the most qualified implementations to be 
tuned as described in Sections 2. 3-2. 5 and to be tested against already exist- 
ing data structures. We singled out the binary search trees and the AVL trees, 
obtaining their tuning code, denoted bst and avl, respectively. 

The string data structures on which we compared are the the adaptive tries of 
Acharya, Zhu, and Shen [1] , the ternary search trees of Bentley and Sedgewick [5] , 
the Patricia tries of Morrison [21], and the linear probing hashing described by 
Knuth [17]. To establish a connection between the measure of skewness -S?(5) 
given in Section 1 and the performance of the above data structures for a set S 
of n strings. Table 1 reports their number of nodes and their height in terms 
of m, the average length of matched prefixes in the formula for the skewness, 
(to -I- 1) = -^{S) log|j;| n. Note that m < k, the maximum string length. 

The cache-aware version of tries (cat) uses multiple alternative data struc- 
tures (partitioned arrays, B-trees and hashing arrays) for representing different 
trie nodes, according to their fanout and to the underlying cache. The height is 
0{m + 1) on the average and 0{k) in the worst case, with 0{n{m + 1)) nodes 
on the average and 0{nk) nodes in the worst case. We slightly modified their 
highly tuned and optimized C-|— I- code at [ I ] to use our timing routines. 

The ternary search trees (tst) are a blend of tries and binary search trees. 
A ternary search tree has 0{n{rn + 1)) nodes on the average and 0{nk) nodes 
in the worst case. Its height is 0{m -I- logn) on the average [9] and contributes 
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to the cost of searching, 0{k + logn) time. We downloaded their C code at [5] 
and slightly modified it with our timing routines. 

The Patricia tries (pat) are compacted tries storing the strings in their leaves 
in a total of 0{n) nodes, and their height is 0{k) in the worst case and 0{m + 
1) on the average. Searching in Patricia tries takes 0{k + logn) time on the 
average [9, 17] by comparing just one character per node, at the price of a full 
comparison of the key in the node where the search ends. We employed the 
implementation described in [26] and removed the recursion. 

Finally, we implemented linear probing hashing (hsh) by using the hash func- 
tion in [18] as suggested in [29], with load factor 0.5 yielding 0{n) occupied space. 
Although hashing does not permit to perform prefix searching, we employed its 
average search time to normalize the timings of the other data structures, be- 
lieving that it may reveal their qualities better than absolute timings. 

3.2 The Data Sets 

1. A word data set. It consists of two dictionaries dictwordsl and dictwords2. 
The former is available at [5]: it contains 25,481 words of average length of 
8.23 characters. The latter is the dictionary available on Unix machines in 
file /usr/dict/words, containing 45,407 words of average length 9.01. 

2. A book data set. It consists of mobydick (i.e., Melville’s “Moby Dick” avail- 
able from [23]) with 212,891 words of average length 6.59 and of warandpeace 
(i.e., Tolstoj’s “War and Peace” available from the same URL) with 565,551 
words of average length 5.64. 

3. A library call number data set. Used in one of the DIMACS Implementation 
Challenges, it is available from [28]. Each entry is a library card number (such 

as WGER 2455 55 20). From this data set, we picked circ2. 10000, 

which consists of 9,983 strings of average length 22.52, and circ2. 100000, 
which consists of 100,003 strings of average length 22.54. We also generated 
libcall from other files, consisting of 50,000 strings of average length 22.71. 

4. A source code data set. Here, keys are the code lines. We choose gprof, 
the source code of a profiler consisting of 7,832 code lines of average length 
25.87, kernel, the source code of a Linux Kernel consisting of 48,731 code 
lines of average length 29.11, and Id, the source code of a linker containing 
19,808 lines of average length 27.77. 

5. A URL data set. Each key is a uniform resource locator in the Web, where 
we dropped the initial http;// prefix. Files range from 25,000 to 1,500,000 
URLS. The URLs were collected in the Web domain * . * . it by a spider for 
the search engine in [3] and by the proxy server at the University of Pisa [7]. 

The dictionary and book data sets in points 1 and 2 tend to have the smaller 
values of skewness .jSf(S'): they consist of many words having small average 
lengths. On the other extreme of the spectrum we have the source code and 
the URL data sets in points 4-5, which are characterized by large values of 
jSf(S'). In between, we find the library call number data sets in point 3. 



92 



Pilu Crescenzi et al. 



Table 2. Running times of the insert operation for the data sets S in order of 
skewness .Sf(S'). The running times are normalized with the corresponding ones 
of hsh, whose value is hence 1 in each entry (not reported) 



^(5) 


cat 


pat 


tst 


bst 


avl 


1.98 


1.00 


3.33 


3.01 


2.33 


3.00 


2.53 


0.88 


2.00 


1.34 


1.55 


2.05 


4.91 


1.93 


3.00 


2.57 


2.53 


3.61 


5.60 


2.58 


1.7 


2.66 


1.25 


1.67 


10.95 


2.56 


4.62 


4.10 


3.49 


4.42 


11.75 


2.69 


3.66 


3.60 


2.72 


3.56 


12.94 


3.26 


3.06 


3.41 


2.25 


3.03 


14.70 


3.86 


2.66 


3.26 


1.93 


2.60 


16.64 


5.48 


2.57 


3.28 


1.71 


2.14 


19.14 


4.00 


3.15 


3.73 


2.31 


3.03 


22.26 


4.80 


2.66 


3.56 


1.93 


2.43 


26.07 


5.14 


2.42 


3.47 


1.67 


2.04 



3.3 The Results 

We ran several experiments on our data sets. One experiment was using two 
different data sets. The first data set (e.g., a dictionary, such as dictcalll or 
dictcall2) was inserted into the data structure at hand, one word at the time. 
Against this data set, we ran searches using all the words of the second data 
set (e.g., a book such as mobydick and warandpeace). The experiment thus 
consisted of carrying out a batch of insertions followed by a batch of lookups. 
We performed several experiments of this type, according to different data sets, 
and to whether the data sets were scrambled randomly or not. As a special case 
of this experiment, we also used the same data set twice: i.e., we first inserted 
the words of a data set (randomly scrambled or not) one at the time in the data 
structure at hand; next, we scrambled the same data set, and searched the data 
structure for each word in this random order. The goal of the data scrambling was 
to avoid sorted data, and to force the batches of insertions and lookups to follow 
completely different patterns in the data structure. We considered the searches 
both in the successful case (the key is stored in the trees) and in the unsuccessful 
case (the key is not stored). We ran each experiment several times and retained 
the average of their running times. Times were measured in milliseconds with 
the Unix command getrusageO. 

The running times of the insert operations, the successful searches and the 
unsuccessful searches are reported in Tables 2-4, respectively. All these experi- 
ments yielded very similar results. Indeed for different experiments, there seemed 
to be very little difference in the relative behavior of our algorithms (even though, 
clearly, the absolute CPU times were substantially different). What we learned 
from our experiments is that cat appeared to be fast on large data sets with 
small keys (i.e., sets S with small skewness .jSf(S')). On the opposite side, bst 
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Table 3. Running times of the successful search for the data sets S in order 
of of skewness jSf(S'). The running times are normalized with the corresponding 
ones of hsh 



^(5) 


cat 


pat 


tst 


bst 


avl 


1.98 


0.30 


3.33 


3.04 


2.33 


2.67 


2.53 


0.55 


2.11 


1.27 


1.47 


1.94 


4.91 


0.96 


2.85 


2.44 


2.62 


3.14 


5.60 


0.92 


1.33 


1.93 


1.00 


1.40 


10.95 


1.15 


4.16 


3.63 


3.19 


3.58 


11.75 


1.21 


3.29 


3.18 


2.52 


2.87 


12.94 


1.49 


2.62 


2.91 


2.05 


2.43 


14.70 


1.77 


2.11 


2.61 


1.66 


2.11 


16.64 


2.10 


1.55 


2.34 


1.33 


1.55 


19.14 


1.65 


2.46 


2.92 


1.92 


2.36 


22.26 


1.77 


1.89 


2.64 


1.53 


1.97 


26.07 


2.18 


1.70 


2.55 


1.33 


1.74 



and avl produced data structures which were effective on data sets with skewed 
keys (i.e., with large jSf(S')). In between, the performance of these data structures 
were not so much different, depending on the underlying architecture: however, 
in all our experiments tst started being competitive with the other two ap- 
proaches exactly in this range. Note that pat was sometimes less competitive 
because of the double traversal needed by the search and update operations. 

An explanation of the experimental behavior of these data structures and 
their relation to the skewness -S?(S') can be found in Table 1, in which their num- 
ber of nodes models the space occupancy and their height models the time per- 
formance. First, notice that we can classify the data structures in three classes: 
tries (cat, tst), compacted tries (pat), and lexicographic trees (tst,bst,avl). 
The members of the former two classes have depth proportional to the aver- 
age length of matched prefixes, to, while the members of the latter class have 
a depth proportional to the logarithm of the number of strings, n, with tst being 
a hybrid member of two classes. Second, their performance is influenced by the 
skewness ^{S), as larger values of ^{S) cause larger values of to for fixed n. 

With the two observations above, we now discuss the experimental behavior 
of data structures. When .jSf(5) is small, the (compacted) tries have depth smaller 
than O(logn) and so their performance is superior, with the exception of pat, 
which requires a double traversal. Also the number of nodes is nearly 0(n). 
When -Sf’(S') is large, the lexicographic trees have depth smaller than 0{m), and 
they compare better. They always require 0{n) nodes, while the others do not 
guarantee this upper bound for skewed strings or have worse time performance. 
For intermediate values of tries and lexicographic trees are of comparable 

heights, and the data structures behave reasonably well. It turns out that cat 
is very fast for small skewness; tst is more competitive for middle skewness; 
avl and bst are the choice for large skewness; pat is sometimes less efficient for 
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Table 4. Running times of the unsuccessful search (not reported for cat) for 
the data sets S in order of of skewness The running times are normalized 

with the corresponding ones of hsh 



J ^{ S ) 


pat 


tst 


bst 


avl 


1.98 


3.00 


3.02 


3.00 


4.02 


2.53 


3.02 


2.50 


3.01 


3.01 


4.91 


2.25 


1.75 


2.50 


3.00 


5.60 


1.33 


1.66 


1.00 


2.01 


10.95 


3.88 


3.17 


3.30 


3.80 


11.75 


2.79 


2.45 


2.39 


2.87 


12.94 


2.43 


2.43 


2.12 


2.68 


14.70 


1.77 


1.88 


1.33 


1.88 


16.64 


2.00 


2.33 


2.00 


2.66 


19.14 


2.43 


2.43 


2.10 


2.66 


22.26 


2.06 


2.25 


1.62 


2.18 


26.07 


1.63 


1.91 


1.36 


1.90 



searching and inserting; hsh is the fastest in all experimented cases, at the price 
of supporting less powerful operations. In general, cat takes significant more 
space than the other data structures when the strings are skewed. 

In summary, for m < a\ log|j;| n, for some constant a\, we expect cat to be 
the fastest, and for m > «2 log|j;| n, for some other constant a 2 > Oi, we expect 
avl and bst to take over, while tst are good in the middle range. These constants 
are highly machine-dependent, as expected. In our case, the search running times 
were from I to 4 times slower than hashing while the insert running times were 
slower from 1 to 5 times. The maximum speedup, computed as the maximum 
ratio between the running times of the slowest and the fastest algorithms for each 
data set (except for the first two in Table 3), is around 2 or 3. Although it may 
not appear as impressive as in other problems, we point out that the involved 
data structures are highly tuned. When employed in a high-performance server 
for indexing, sorting and compressing textual data, doubling the actual amount 
of served requests per time unit can make a difference in practice. 

Some final remarks are in order. Search time in tst is rarely worst-case 
compared to that of bst, and tst has typically smaller height than that of 
bst. The extra work for keeping avl balanced apparently does not pay. As 
for the required space, cat and tst take more nodes than the others, which 
require to store explicitly the strings somewhere else. In some cases, tst may 
require globally less space (including that required by the strings) because it can 
store the strings in its structure. Being so simple, tst are the method of choice 
for tries [9] while our experiments indicate that bst are effective choices when 
compacted tries are needed, especially with skewed strings. 
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4 Conclusions 

We described a general data structuring technique that, given a linked data struc- 
ture for storing atomic keys undergoing comparisons, transforms it into a data 
structure for strings. Despite the simplicity and generality of our method, our 
experimental study shows that it is quite competitive with some implementa- 
tions currently available in the literature. We can extend our technique to obtain 
text indexing data structures that are also able to perform suffix sorting. Hence, 
our augmented data structures provide a good alternative to popular text in- 
dexes such as suffix arrays and suffix trees [15]. They can also perform the suffix 
sorting of a text, which is the basis of text indexing and data compression. 
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Abstract. In this paper we analyze empirically the performance of sev- 
eral protocols for a random model of sensor networks that communicate 
through optical links. We provide experimental evidence of the basic pa- 
rameters and of the performance of the distributed protocols described 
and analyzed under assymptotic assumptions in [3] for relatively small 
random networks (1000 to 15000 sensors). 



1 Introduction 

The fast development of microelectronics and communications has opened a fast 
field in the development of low-cost, little power consumption and small size, 
multi-functional sensors. Large number of these sensors can be spread to form 
networks of sensors [1]. Sensor networks are supposed to be deployed in hostile 
environments, in order to permit monitoring and tracking of remote objects, 
detecting anomalous situations as fires in the woods, seismic activity, and others. 
Communication among the sensors is usually done by radio frequency or, in more 
recent projects, by optical transmission via small lasers (see i.e. [4, 5, 8]). Free- 
space optical links have the limitation of an uninterrupted line-of-sight path 
for communication, but avoid radio interference. In particular, the Smart Dust 
project at Berkeley [6, 8] has proposed the use of optical communication between 
sensors as an alternative to radio frequency. Both in the case of radio frequency 
and of optical communication, the energy consumption is a key factor in the 
evaluation of these networks. 

Recent efforts have been made to formalize and to give more efficient algo- 
rithms for networks of sensors, but these efforts have mainly concentrated on 
models using radio frequency communication [2, 7]. In [3], the authors proposed 
a probabilistic model to analyze smart dust networks communicating through 
optical devices. Although simple, our model seems to incorporate the basic tech- 
nological specifications of smart dust systems [5]. We also considered and ana- 
lyzed there some basic communication protocols in the proposed model, taking 
into account power management as a main issue in their design. Such protocols 

* Work partially supported by the 1ST Programme of the EU under contract number 
IST-2001-33116 (FLAGS) and by the Spanish GIGYT project TIG-2001-4917-E. 
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Fig. 1. Elements in the motes of the Smart Dust project (reproduced from [5]) 



represent a research challenge, due to the optical nature of these networks and 
because of the presence of faulty connections among sensors. However, the ana- 
lytical results obtained in [.3] were mostly of asymptotic nature. In the present 
work, we experimentally study whether these asymptotic results hold for smaller 
networks. 

Recall that the setting of smart dust systems is to have a base-station trans- 
ceiver (bts) at a relative elevation (or in a small plane) monitoring periodically 
the information of a large amount of sensors {motes) that have been scattered on 
a terrain. Motes include sensor devices, a small laser cannon and a set of optical 
devices able to modulate and reflect the light they receive. Fig. 1 reproduces the 
design of such a mote. 

The scenario considered in [3] is as follows. The motes are scattered massively 
at random from a vehicle. As a consequence some of them may break or fall in 
such a way that can not communicate with the bts. Then, the communication 
will be initiated from the BTS: the bts scans an area with its laser, and each 
mote passively modulates and reflects the beam. This is the preferred way to 
communicate, as the mote uses little power. If a mote is shadowed from the bts, 
this mote must rely its information to the bts through other motes (any mote 
can detect a failure of communication with the bts by noticing that sufficient 
time has passed without communication) . Communication between motes is done 
in an active way using their laser beams. This kind of transmission uses more 
power. To send information, they use an orientable low power laser beam, which 
current technology allows to move sidewards and upward about forty degrees. 
To receive information, motes have an optical device able to detect and interpret 
laser signals, as well as to evaluate the direction of the incoming beam. 

The network is established in two phases. The localization phase consists 
in finding out the position of each mote; that is, at the end of this phase it 
is required that all operative motes know their approximate coordinates (GPS 
systems cannot be currently used because of their volume, price and power con- 
sumption). The second phase consists in establishing a route from the motes 
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that cannot communicate directly with the bts through the motes than can. 
Such phase is called route establishment and is attained by repeated operations 
of broadcasting and gathering protocols. Once these two main steps are per- 
formed, the network is able to enter in its exploitation phase, which will last 
until the power supply of the motes decays. 

In this paper, we give empirical evidence that the system and protocols de- 
scribed and analyzed in [3] perform correctly, with only a small increase in the 
predicted constants, when dealing with networks of a few thousand sensors. 



2 The Model 

In this section, we recall the random model for smart dust systems that was 
proposed in [3]. 

According to [3], we deal with an ideal two dimensional squared region in 
which a mote falls with coordinates following the uniform distribution. Any mote 
can orient its laser cannon in any position of its scanning area, which covers 
a sector of a fixed angle of a radians. The sector is also oriented randomly with 
an angle b relative to some horizon, but the mote can receive light from any 
point, within distance r, which is “looking” to it. We assume that the terrain to 
be covered is an square area of size D x D. An appropriate scaling can map it 
to the unit square [0, 1]^. 

These considerations give raise to a directed graph model. We define a ran- 
dom digraph, where the vertices are the motes, and given vertices i and j, there 
is an arc (i,j) if j lies in the sector with center i defined by r, starting at angle 
a and ending at angle a + bi (modulo 2tt). More specifically, let X = (Ai)i>i 
be a sequence of independently and uniformly distributed (i.u.d.) random points 
in [0, 1]^, let B = {bi)i>i be a sequence of i.u.d angles and let {ri)i>i be a se- 
quence of numbers in [0, 1]. For any natural n, we write Xn = {Ai, . . . , A„} and 
Pn = {bi, ■ ■ . , bn}. We call GaiXn, Pm Tn) the random scaled sector graph with n 
nodes. 

The objective of the network is monitoring the terrain. To do so, we assume 
that the terrain is dissected in a grid of s cells, each of size x 

This grid serves as reference position for the motes, and the data reported from 
a mote refers to the cell in the grid that contains it. The parameter s is selected 
to represent the sensing precision of the network and controls its scalability. 

In the process of scattering the motes, some of them may break down or fall 
upside down and become inoperative, some may be hidden from the bts and for 
some pair of motes the line of sight may be blocked. We assume that motes fall 
uniformly at random in the terrain and that each mote can fail, independently 
at random, with probability 1 — po - Operative motes communicate with the bts 
with probability pi,. Also, we assume that Pc is the probability that the line 
of sight allowing communication from one operative mote to another operative 
mote is not interrupted by any obstacle. 

We use the following parameters to describe a smart dust network: n is the 
number of motes; r is the laser range of the motes; a is the laser scanning 
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angle; s is the number of cells in the grid; Po is the probability that a mote is 
operative; pf, is the probability that an operative mote can communicate with 
the BTS (these are called communicating motes)', Pc is the probability that the 
line of sight between two motes is not interrupted. We consider here that po, Pc 
and pb are fixed probabilities. 

As defined in [3], the random smart dust network ST>n{X, B, a, r, s,Po,Pb,Pc) 
is a network of n motes Xi , . . . , where Aj is operative with probability Po, 
can communicate with the BTS with probability pb, and Xi and Xj can commu- 
nicate with probability Pc if Xj is contained in the sector of radius r and angle a 
starting at /3i. In the remaining of the paper, such system will be simply denoted 
as ST>n- 

We will use the term e-normalized random smart dust system to refer to 
a random smart dust system ST>n, with n = (1 -|- e)(s In s)/po motes and such 
that g = rj ^/s is a constant and e > cq. Here, eo is a constant that depends on g. 
This selection of n is made according to [3], where it is stated that, with high 
probability, a random smart dust system with n motes can effectively monitor s 
cells. Given the laser range r and the number of cells s, the value g = rfy^ 
is another relevant parameter that will appear in the analysis of random smart 
dust systems. In our experiments we take g = 10. 

Most of the results relate to interior motes, defined as those motes whose 
distance to the boundaries of [0, 1]^ is greater than r. 

3 Connectivity 

It is known that, with high probability, the interior motes of normalized smart 
dust systems are strongly connected: 

Theorem 1 ([3]). LetST>n he a e-normalized smart dust system. Then, w.h.p., 
for all pairs {x, y) of interior operative motes, there is a directed path from x to y. 

Recall that a sequence of events {En)n>o occurs with high probability ( w.h.p.), 
if lim„^oo Pr [En] = 1. As a consequence, the result in the above theorem is just 
asymptotic and does not provide any clue on how fast the probability converges 
to 1 . Our first set of experiments provides an analysis of the network connectiv- 
ity. The goal is to simulate smart dust networks in order to know whether the 
predicted asymptotic behavior holds in small networks for different values of its 
parameters. 

To do so, we have generated different random networks with a = 40 degrees 
and have measured the number of interior motes that belong to the largest 
strongly connected component for different values of n, Po and Pc. The number 
of motes (n) has been selected in the range 1000 to 15000 with increments of 
1000; we believe that this is a reasonable number of motes. The probability of 
communication between motes (pc) has been selected in the range 0 to 1, with 
increments of 0.1, in order to cover a wide spectrum of possibilities. In this and 
all the following experiments, we have taken r„ = \ogn/^Jn for the radius; the 
rationale behind this selection is that the results in [3] hold for r„ > co y/ (log n) jn 
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for Co large enough, and our selection is just slightly greater than any cq. In all 
the experiments, we have also restricted the values for the probability of being 
operative (po)'- We have selected three different values: 0.15, 0.5 and 0.85; these 
represent low, medium and high probabilities. 

The obtained results are presented in Table 1, which shows the percentage of 
interior motes that belong to the largest strongly connected component related 
to the total number of interior motes. The experiments have been repeated 200 
times, without noticing large deviations from the averages, which are shown in 
the tables. 

As it could be expected, the results show that as n, pc or po grow, the ratio 
of number of motes in the largest strongly connected component grows. More 
importantly, the results show that for certain values of the parameters, almost 
100% of the pairs of interior motes are connected. This shows that the asymptotic 
predicted behavior in Theorem 1 can be observed even for relatively small values 
of the n, Pc and Po parameters. 

4 The Localization Algorithm 

Once deployed, the first task for a smart dust network is to find the position 
of each operative interior mote. Motes that can communicate with the bts will 
receive its coordinates from it (it suffices to include the emission angle in the bts 
message [5]). The remaining motes will have to compute their coordinates based 
on the coordinates of other motes and the angle of incidence of the incoming laser 
beams. In this section, we simulate the localization algorithm localize proposed 
in [3] to solve the localization stage for optical smart dust systems. We refer 
to [3] for a complete description of this protocol. The expected performance and 
runtime of the localize protocol is given by the following result: 

Theorem 2 ([3]). Let ST>n he an e-normalized random smart dust system. 
Then, as s grows, w.h.p., after a eonstant number of phases of the localize pro- 
toeol, all operative interior motes know their position. 

Again, the result is asymptotic. Moreover, the required number of steps is 
unknown. Therefore, our goal in this section is two-fold: we want to know whether 
the localization phase can be performed with the localize algorithm when the 
number of motes is in the order of a few thousands, and we want to know how 
many phases are needed. 

In this case, the probability pb of reaching the bts matters. Therefore, the 
parameters for these simulations involve n, pb, Po and Pc. Again, we take pb G 
{0.15,0.50,0.85}. Tables 2, 3 and 4 show the empirical results obtained using 
the localize protocol. From these values, it can be observed that the localization 
phase will succeed when pc, Pb or po are high, but that for a low number of motes 
and a low values of Pc, Pb or po the algorithms fails to assign coordinates to each 
interior operative mote. In theses cases, though, more than 90% of the inte- 
rior motes could compute their position. With regards to the number of phases 
needed to complete the algorithm, our results indicate that, in the average, 6 
phases are enough and that never more than 20 phases have been required. 
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5 Broadcasting from the bts to the Motes 

Another of the issues to be considered is the broadcasting of messages originated 
in the bts. We consider the simple-bro protocol given in [3], which is a classical 
flooding algorithm with multiple source points. The theoretical analysis of the 
broadcasting protocol in the random smart dust network is the following: 

Theorem 3 ([3]). Let ST>n he an e-normalized random smart dust system. 
Then, as s grows, w.h.p., after performing 4\ ^2po/{pb{^ + phases of the 

simple-bro protocol, a message originated in the bts will be broadcasted to all 
interior motes. 

Using the same parameters as in the experiments done in the previous section, 
Tables 5, 6 and 7 show the percentage of interior motes which receive the message 
transmitted by the bts using the simple-bro protocol, within the number of 
phases indicated in Theorem 3. Again, for reasonable values of Po and pb, we 
can observe that as n and Pc grow, the percentage of interior motes receiving 
the message originated in the bts in the prescribed number of phases soon goes 
over 95%. 



6 Route Establishment from the Motes to the bts 



Before exploiting the network of sensors, a smart dust system must establish 
a routing so that any operative interior mote can send its information to the 
bts. Motes that can communicate directly with the BTS will simply transmit 
passively their information when the bts queries them. The remaining motes 
will have to send their information actively by the way of multiple hops. Since 
the communication between motes is not bidirectional, we can not simply reverse 
the paths found by the previous algorithm. 

To establish a route from the mote to the bts, we use the protocol simple-link 
described in [3]. That protocol computes a set of the routes from the motes to 
the BTS form an oriented forest with roots in the communicating motes. 



Theorem 4 ([3]). Let ST>„ 



be an e-normalized random smart dust system. 

phases of the simple- 



Then, as s grows, w.h.p., after performing 4/ 
link protocol, all interior motes have selected a neighbour to whom send the in- 
formation to be forwarded to thr bts. 



The important part in the analysis of the previous algorithm done in [3] was 
the fact that, asymptotically, almost all pairs of motes at Euclidean distance 
less than r are at distance at most 4 in the network. Therefore, in this case, our 
experiments focus on this property rather than in the whole simulation of the 
simple-link protocol. Using the same parameters as in the previous experiments. 
Tables 8 and 9 present the measured results. (Unfortunately, we could not com- 
pute shortest paths in networks with more than 5000 motes when po = 0.85.) In 
any case, these show that for the smaller number of nodes that we are dealing 
with the number of phases of broadcasting used in each simple-link phase must 
be increased to 6. 
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7 Conclusions 

In this paper, we have investigated whether the asymptotic results we draw in [3] 
hold for normalized smart dust systems with thousands of motes. 

Our experiments have focussed on the simulation of the basic protocols pre- 
sented in [3] and span several basic tasks to exploit networks of sensors. The 
parameters we have used to study our protocols are the size of the network (n) 
and the different failure probabilities {po, Pc and pb). We have reported results 
for a = 40°, but simulations with a = 20° present a similar behavior. 

The results we have reported show that the predicted asymptotic behavior 
can be detected on small networks by setting the probabilities to reasonably 
high values, at the expense of a slight increase of the constants. Observe that 
this increase of the constants only delays the length of the broadcasting, but 
does not result in any additional increment of energy consumption that remains 
as one full scan per mote and per phase of the simple-link protocol. 
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Abstract. Recently two different linear time approximation algorithms 
for the weighted matching problem in graphs have been suggested 
Both these algorithms have a performance ratio of 1 /2. In this paper we 
present a set of local improvement operations and prove that it guaran- 
tees a performance ratio of 2/3. We show that a maximal set of these 
local improvements can be found in linear time. 

To see how these local improvements behave in practice we conduct an 
experimental comparison of four different approximation algorithms for 
calculating maximum weight matchings in weighted graphs. One of these 
algorithms is the commonly used Greedy algorithm which achieves a per- 
formance ratio of 1/2 but has 0(m log n) runtime. The other three algo- 
rithms all have linear runtime. Two of them are the above mentioned 1/2 
approximation algorithms. The third algorithm may have an arbitrarily 
bad performance ratio but in practice produces reasonably good results. 
We compare the quality of the algorithms on a test set of weighted graphs 
and study the improvement achieved by our local improvement opera- 
tions. We also do a comparison of the runtimes of all algorithms. 



1 Introduction 

A matching M in a graph G = (V,E) is defined to be any subset of the edges 
of G such that no two edges in M are adjacent. If G = (V, A) is a weighted graph 
with edge weights given by a function w : E R+ the weight of a matching 
is defined as to be w{M) := weighted matching problem is to 

find a matching M in G that has maximum weight. Calculating a matching of 
maximum weight is an important problem with many applications. The fastest 
known algorithm to date for solving the weighted matching problem in general 
graphs is due to Gabow 0 and has a runtime of 0(| Vljifl -I- \V\^ log |V|). 

Many real world problems require graphs of such large size that the runtime 
of Gabow’s algorithm is too costly. Examples of such problems are the refine- 
ment of FEM nets US], the partitioning problem in VLSI-Design PI, and the 
gossiping problem in telecommunications 0. There also exist applications were 
the weighted matching problem has to be solved extremely often on only moder- 
ately large graphs. An example of such an application is the virtual screening of 

* Supported by DFG research grant 296/6-3. 

K. Jansen et al. (Eds.): WEA 2003, LNCS 2647, pp. 1 n7- ITT^ 2003. 

(c) Springer-Verlag Berlin Heidelberg 2003 
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protein databases containing the three dimensional structure of the proteins . 
The graphs appearing in such applications only have about 10,000 edges. But 
the weighted matching problem has to be solved more than 100,000,000 times 
for a complete database scan. 

Therefore, there is considerable interest in approximation algorithms for the 
weighted matching problem that are very fast, having ideally linear runtime, and 
that nevertheless produce very good results even if these results are not optimal. 

The quality of an approximation algorithm for solving the weighted match- 
ing problem is measured by its so-called performance ratio. An approximation 
algorithm has a performance ratio of c, if for all graphs it finds a matching with 
a weight of at least c times the weight of an optimal solution. Recently two dif- 
ferent linear time approximation algorithms for the weighted matching problem 
in graphs have been suggested 0(121. Both these algorithms have a performance 
ratio of 1/2. In this paper we present a set of local improvement operations and 
prove that it guarantees a performance ratio of 2/3. We show that a maximal 
set of these local improvements can be found in linear time. 

The performance ratio only gives information about the worst case behaviour 
of an algorithm. In this paper we will also study how several algorithms for the 
weighted matching problem behave in practice. We make an experimental com- 
parison of the three known approximation algorithms for the weighted matching 
problem that have a performance ratio of We also include a simple, extremely 
fast heuristic that cannot guarantee any performance ratio but behaves reason- 
ably well in practice. In addition we apply our local improvement operations to 
all these algorithms and test how much improvement they yield in practice. 

2 Local Improvements 

The idea of local improvements has been used in several cases to improve the 
performance ratio of approximation algorithms. See maoi for such examples. 

In the case of the unweighted matching problem which is usually called the 
maximum matching problem it is well known that by local improvements a given 
matching can be enlarged. In this case the local improvements are augmenting 
paths, i.e. paths that alternately consist of edges contained in a matching M and 
not contained in M such that the first and the last vertex of the path are not 
contained in an edge of M. From a result of Hopcroft and Karp mi it follows 
that if M is a matching such that a shortest augmenting path has length at 
least / then M is an approximation of a maximum matching. 

We extend the notion of augmenting paths to weighted matchings in a natural 
way. Let G = (V,E) be a weighted graph with weight function w : E ^ IR+ and 
M C E he an arbitrary matching in G. A path or cycle is called M -alternating 
if it uses alternately edges from M and E\M. Note that alternating cycles must 
contain an even number of edges. Let P be an alternating path such that if it 
ends in an edge not belonging to M then the endpoint of P is not covered by an 
edge of M . The path P is called M -weight- augmenting if 

w{E{P)f]M) < w{E{P)\M) . 
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O — O 




m m — • — > m — s- o — • — m m — o 
Fig. 1. The seven local improvements. Edges belonging to the matching are 
shown in bold. Hollow vertices are vertices not contained in any matching edge 



If P is an M-weight-augmenting path then MAP (the symmetric difference 
between M and P) is again a matching with strictly larger weight than M. The 
notion of M-weight-augmenting cycles is defined similarly. 

We will consider in the following M-weight-augmenting cycles of length 4, M- 
weight-augmenting paths of length at most 4 and M-weight-augmenting paths 
of length 5 that start and end with an edge of M. See Fig. Q] for all possibilities 
of such weight augmenting paths and cycles. 

The following result shows that the non-existence of short M-weight- 
augmenting paths or cycles guarantees that the weight of M is at least 2/3 
of the maximum possible weight. 

Theorem 1. Let Mopt be a maximum weight matching in a weighted graph 
G = {V, E) with weight function w : E ^ IR_|_. If M is a matching in G such 
that none of the seven operations shown in Fig. ^increases the weight of M then 

w{M) > ^ • w{Mopt) ■ 

Proof. Consider the graph induced by the symmetric difference M A Mopt- It 
consists of even alternating cycles Ci and alternating paths Pi. We will show 
that 

w{Ci CM) > ^ • w{Gi n Mopt) Vi (1) 

and 

w(Pi CM) > - • w{Pi n Mopt) Vi . 

Equation (0 and 0 imply 

w{M) = w{M n Mopt) + w(Ci n M) + w{Pi n m) 

2 2 2 
> - • w{M n Mopt) + 3 ■ H n Mopt) + 3 ■ Z! n Mopt) 

2 

= — • w{Mopt) ■ 



(2) 
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We start proving ©. If Ci is a cycle of length 4 then by the assumptions of 
the theorem 

2 

w{Ci n M) > w{Ci n Mopt) > - ■ w{Ci n Mopt) ■ 

Now assume Ci is a cycle of length at least 6. Let 61,62,63,... be the edges 
of Ci in a consecutive order such that Cj G M for j odd. Consider a subpath of 
type 62fc+i, 62fc+2, 62fc+3, 62fe+4, 62fe+5 in Ci- By the assumptions of the theorem 
we have 

w{e 2 k+i) + w{e 2 k+ 3 ) + w(e 2 k+ 5 ) > w{e 2 k+ 2 ) + w{e 2 k+ 4 ) ■ 

By summing this inequality over all possible values for k we see that each edge 
of Ci n M appears 3 times on the left side and each edge of Ci C Mopt appears 
2 times on the right side of the inequality. Therefore 

S-w{CinM) > 2 -w{CinMopt) . 

This proves (HJ. 

We use a similar idea to prove 0. Let €j be the edges of a path Pi such 
that Bj € M if and only if j is odd. The path may start with 61 or bq as we do 
not know whether it starts with an edge of M or an edge of C \ M. Consider 
subpaths of Pi of type 62fc+i, 62^+2, 62^+3, e2fc+4, 62^+5- We have 

w{B 2 k+l) + w{B 2 k+ 3 ) + w{B 2 k+b) > w{B 2 k+ 2 ) + w{B 2 k+ 4 ) ■ 

Now extend the path Pi artificially to both sides by four edges of weight 0. So we 
have edges 6_i, e_2, ... on the left side of Pi. Now add the above inequality for 
all k where k starts at —2. Then each edge of PiDM appears in three inequalities 
on the left and each edge of Pi C Mopt appears in 2 inequalities on the right. The 
artificial edges may appear in arbitrary number. As they have weight 0, it does 
not matter. Therefore 

3-w(P, CM) > 2-w{P^nMopt) . 

This proves 0. □ 

Theorem 2. . Lst G = (C, E) bs a wsighted graph with wBight function w : 
E — > 1R+ and let M C E be a matching. A maximal set of any of the seven 
operations shown in Fig. 0 that are pairwise node disjoint and such that each 
operation increases the weight of M can be found in linear time. 

Proof. To achieve linear runtime in a preprocessing step each vertex that is 
covered by M gets a pointer to the edge of M it belongs to. Consider an ar bit ary 
edge 6 G M. To decide whether e belongs to an M-weight-augmenting C4 run 
over all edges incident to one endpoint of e and mark all edges of M that are 
adjacent to such edges. Now run over the edges incident to the other endpoint of e 
and see whether they are incident to the other endpoint of a marked edge. If yes, 
an alternating C4 has been found. Check whether it is M-weight-augmenting. If 
yes remove it from the graph. 

Similarly paths of length at most 4 and paths of length 5 which have an edge 
of M at both ends can be found in linear time. □ 
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3 The Algorithms 

In this section we briefly describe four different approximation algorithms for the 
weighted matching problem in graphs. For all these algorithms we have tested 
their performance and runtime with and without our local improvements. The 
results of these tests are given in Section ^ 

The first of the approximation algorithms is Greedy Matching 0 shown in 
Fig.0 Greedy Matching repeatedly removes the currently heaviest edge e and 
all of its adjacent edges from the input graph G = E) until E is empty. In 
each iteration e is added to the matching M which is returned as the soltution. 
It is easy to see that Greedy Matching has a performance ratio of ^ d- If the 
edges of G are sorted in a preprocessing step then the runtime is 0{\E\ log |F|). 

The second algorithm is the LAM algorithm by Preis Hg. A sketch of this 
algorithm is shown in Fig. El Preis improved upon the simple greedy approach 
by making use of the concept of a so called locally heaviest edge. This is defined 
as an edge for which no other edge currently adjacent to it has larger weight. 
Preis proved that the performance ratio of the algorithm based on this idea is 
He also showed how a locally heaviest edge in a graph can be found in amortized 
constant time. This results in a total runtime of 0{\E\) for the LAM algorithm. 
See [ig for more details. 

The third algorithm is the Path Growing Algorithm (PGA) by Drake and 
Hougardy 0 shown in Fig. El PGA constructs node disjoint paths in the input 
graph G along heaviest edges by lengthening the path one node at a time. Each 
time a node is added to a path it and all of its incident edges are removed from 
the graph. This is repeated until the graph is empty. By alternately labelling the 
edges of the paths 1 and 2 one obtains two matchings M\ and M 2 , the larger of 
which is returned as the solution. The PGA algorithm has a performance ratio 
of ^ and a runtime of 0(|if|) 0. There are two simple improvements that are 
proposed in ^ which can be applied to the PGA algorithm without changing its 
runtime. The first is to compute a maximum weight matching along the paths 
constructed by the algorithm and return this as the solution. The second is to 
add any remaining edges in the graph to the solution until the solution becomes 
a maximal matching. Neither of these improvements can guarantee a better worst 
case behaviour for the algorithm, but in practice these improvements can make 



Greedy Matching (G = {V,E),w : E — > R+) 

1 M := 0 

2 while A 7 ^ 0 do begin 

3 let e be the heaviest edge in E 

4 add e to M 

5 remove e and all edges adjacent to e from E 

6 end 

7 retnrn M 



Fig. 2. The greedy algorithm for finding maximum weight matchings 



112 Doratha E. Drake and Stefan Hougardy 



LAM (G = {V, E),w : E ^ R+) 

1 M := 0 

2 while E 7 ^ 0 do begin 

3 find a locally heaviest edge e £ G 

4 remove e and all edges adjacent to e from G 

5 add e to M 

6 end 

7 return M 



Fig. 3. The LAM approximation algorithm for finding maximum weight match- 
ings 



PathGrowingAlgorithm (G = {V,E),w : E — > R+) 

1 Ml ;= 0, Ma — 0 

2 while E do begin 

3 choose X & V oi degree at least 1 arbitrarily 

4 grow a path from x along heaviest edges added alternately to Mi and Ma 

5 remove the path from G 

6 end 

7 return max(u;(Mi), ui(Ma)) 

Fig. 4. The Path Growing Algorithm for finding maximum weight matchings 



MM (G = (V, E),w : E ^ R+) 

1 M := 0 

2 while E ^ $ do begin 

3 choose X € V arbitrarily 

4 add to M the heaviest edge e incident to x 

5 delete all edges adjacent to e from E 

6 end 

7 return M 



Fig. 5. The MM heuristic for finding maximum weight matchings 



a considerable difference. Therefore we test both versions of this algorithm. We 
call the second version where both improvements are applied PGA'. 

Finally we include a trivial heuristic called MM (for Maximal Matching) 
shown in Fig. El This heuristic computes a maximal matching in a graph in 
a greedy manner. For each vertex x a heaviest edge e which is incident to x 
is added to the solution. All edges adjacent to e are removed. The runtime of 
algorithm MM is 0{\E\). It is easy to construct examples where the weighted 
matching returned by MM can be arbitrarily bad. Yet it is interesting to see how 
this heuristic behaves in practice. Therefore we have included it in this study. 
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4 Test Instances 

We test our implementations of the above algorithms and local improvements 
against four classes of data sets: random graphs, two dimensional grids, complete 
graphs, and randomly twisted three dimensional grids. We do not include any 
geometric instances in this study as none of the algorithms considered here was 
designed to take advantage of any geometric properties. 

Within each class instances with different parameters such as number of 
vertices, edge densities etc. have been generated. For each specific parameter set 
ten instances were randomly generated. Each of the algorithms was run on all 
ten instances and then the average value of the runtime and percentage of the 
deviation from an optimal solution was computed. We calculated the optimal 
solutions to these instances using LEDA H3|. The size of the test instances was 
restricted to graphs with at most 100,000 vertices and at most 500,000 edges 
because for these instances the runtime of the exact algorithm was already several 
hours. 

The random graphs are based on the Gn,p model. This means that the graphs 
have n vertices and the possibe ( 2 ) edges are chosen independently with prob- 
ability p. We have chosen n = 10,000 and p in the range from 5/10000 to 
100/10000 resulting in graphs from about 25,000 to 500,000 edges. The edge 
weights are integer values chosen randomly from the range of 1 to 1000. The 
graphs are labelled as ’’R10000.5” through ’’RIOOOO.IOO” . 

The two dimensional grids are grids of dimension h x 1000 with h chosen in 
the range 10 to 100. The edge weights for these grids have been assigned integer 
values chosen randomly from the range between 0 and 999. These graphs are 
labelled ”G10” through ”G100”. 

The complete graphs are graphs on n vertices containing all possible ( 2 ) 
edges. We have generated these graphs for n = 200 to 2000. The integer edge 
weights have been randomly assigned from the range of 0 to 999. These graphs 
are labelled ”K200” through ”K2000”. 

For the randomly twisted three dimensional grid we have used the RMFGEN 
graph generator introduced in which was used to generate flow problems. The 
graphs created consist of a square grid of dimension a called a frame. There are b 
such frames Fi,. . . ,F{, which are all symmetric. There are edges connecting 
the nodes of F^ to a random permutation of the nodes of for 1 < * < 
b. The edges within a frame all have weight 500. Those between frames have 
weights randomly chosen between 1 and 1,000. The only changes we have made 
to the RMFGEN generator besides making the graph undirected is concerning 
the weights of the in-frame edges. We have assigned 500 to these edges instead 
of * 1000 assigned by RMFGEN as the latter value did not seem to produce 
instances that were as interesting for the weighted matching problem. We have 
created three such tests on graphs of dimension a = 4, b = 1250; a = 27, b = 27; 
and a = 70, 6 = 4. These three instances are labelled ”a”, ”b” and ”c”. 
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5 Experimental Results 

The following two subsections contain the experimental results we obtained on 
the test instances described in Section 2] We have compared the five algorithms 
described in Section 0 with and without additionally performing our local im- 
provement operations which where applied as follows: For each of the seven local 
improvement operations shown in Fig. ^ we have computed a maximal set of dis- 
joint improvements in time 0(m) as indicated in Theorem|2| and then augmented 
along this set. We used the same ordering of the operations as shown in Fig. 0 

For each row in a table ten different test instances have been generated and 
the average value has been taken. The variance was in all cases below 1%, in 
many cases even below 0.5%. Due to space restrictions we list the results for 
some part of the instances only. 

5.1 Performance 

Tables □ 121 and 0 show the performances of the five algorithms on the different 
classes of test sets. The first column of the tables contains the name of the test 
instance as described in Section 0 The next two columns ”n” and ”m” denote 
the number of vertices and edges of the graph. In case of the graphs ’’RlOOOO.x” 
the number of edges is the average value of the ten test instances that were 
computed for each row of the table. In all other cases the ten test instances have 
the same number of edges, only the weight of the edges differs. The next five 
columns show the difference in % of the solution found by the algorithms to the 
optimum solution. The names of the algorithms are abbreviated as in Section 0 
Each row contains one value in bold which is the best value. The worst value is 
given in gray. 

As can be seen from Table 0 there is a great difference in the quality of the 
PGA and PGA' algorithm. The simple heuristics added to the PGA algorithm 
drastically improve its performance in practice. This observation also holds for 
all other test instances. On all random graph instances the PGA' algorithm per- 
forms best. The LAM and Greedy algorithms have almost the same quality which 



Table 1. Performances of the five algorithms on weighted random graphs with 
different densities. The values denote the difference from the optimum in % 



graph 


n 


m 


Greedy 


MM 


LAM 


PGA 


PGA' 


R10000.5 


10000 


25009 


8.36 


14.66 


8.38 


14.70 


7.38 


RIOOOO.IO 


10000 


49960 


9.18 


12.59 


9.18 


12.56 


8.31 


R10000.20 


10000 


100046 


7.73 


9.40 


7.74 


9.52 


7.20 


R10000.30 


10000 


150075 


6.28 


7.55 


6.29 


7.55 


5.95 


R10000.40 


10000 


200011 


5.46 


6.33 


5.50 


6.30 


5.08 


R10000.60 


10000 


299933 


4.19 


4.83 


4.23 


4.82 


3.99 


R10000.80 


10000 


399994 


3.52 


3.99 


3.54 


4.02 


3.39 


RIOOOO.IOO 


10000 


499882 


3.05 


3.39 


3.08 


3.45 


2.95 
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Table 2. Performances of the five algorithms on weighted grid like graphs. The 
values denote the difference from the optimum in % 



graph 


n 


m 


Greedy 


MM 


LAM 


PGA 


PGA' 


GIO 


10000 


18990 


5.87 


12.38 


5.87 


12.76 


4.51 


G20 


20000 


38980 


5.97 


13.04 


5.98 


12.50 


4.53 


G40 


40000 


78960 


5.99 


13.45 


5.99 


12.52 


4.60 


G60 


60000 


118940 


5.96 


13.68 


5.97 


12.49 


4.66 


G80 


80000 


158920 


5.99 


13.79 


5.99 


12.61 


4.67 


GlOO 


100000 


198900 


6.05 


13.82 


6.05 


12.60 


4.66 


a 


20000 


49984 


5.71 


10.22 


5.50 


12.91 


5.59 


b 


19683 


56862 


5.79 


8.79 


5.25 


13.14 


6.14 


c 


19600 


53340 


6.13 


8.00 


4.93 


12.71 


6.22 



Table 3. Performances of the five algorithms on weighted complete graphs. 
The values denote the difference from the optimum in % 



graph 


n 


m 


Greedy 


MM 


LAM 


PGA 


PGA' 


K200 


200 


19900 


1.75 


1.63 


1.77 


1.65 


1.55 


K600 


600 


179700 


0.72 


0.82 


0.69 


0.75 


0.72 


KIOOO 


1000 


499500 


0.46 


0.55 


0.49 


0.53 


0.51 


K1400 


1400 


979300 


0.39 


0.38 


0.39 


0.37 


0.35 


K2000 


2000 


1999000 


0.27 


0.29 


0.29 


0.29 


0.29 



is slightly worse than that of PGA'. For all algorithms the performance improves 
as the random graphs get denser. The only exception are the extremely sparse 
graphs ’’R10000.5”. Such an effect also has been observed in the unweighted 
case IT^ . 

Table shows that the performances of all algorithms are independent of 
the size of the test instances. For the two dimensional grids the PGA' algorithm 
achieves the best solutions. Again the Gredy algorithm and the LAM algorithm 
have almost the same quality which is significantly worse than that of PGA'. This 
situation changes in the case of the randomly twisted three dimensional grids. 
Here the LAM algorithm achieves the best result and the Greedy algorithm is 
slightly better than PGA'. 

On weighted complete graphs all five algorithms have almost the same qual- 
ity. For large complete graphs the performances of all algorithms tends to one. 
This is of course not surprising as in complete graphs an algorithm can barely 
choose a ’wrong’ edge. 

Tables E]0 andEHshow the performances of the five algorithms on the differ- 
ent classes of test sets with local improvements applied. This means that we have 
taken the solution returned by the algorithms and then computed a maximal set 
of pairwise disjoint local improvements for each of the seven local improvements 
shown in Fig.Q] The quality of all algorithms is drastically improved by the local 
improvements. The deviation from the optimum solution is reduced by a factor 
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Table 4. Performances of the five algorithms on weighted random graphs with 
different densities with local improvements applied. The values denote the dif- 
ference from the optimum in % 



graph 


n 


m 


Greedy 


MM 


LAM 


PGA 


PGA' 


R10000.5 


10000 


25009 


3.15 


4.57 


3.15 


5.21 


3.07 


RIOOOO.IO 


10000 


49960 


4.62 


5.52 


4.62 


6.14 


4.53 


R10000.20 


10000 


100046 


4.39 


4.90 


4.39 


5.42 


4.44 


R10000.30 


10000 


150075 


3.75 


4.18 


3.76 


4.56 


3.87 


R10000.40 


10000 


200011 


3.22 


3.60 


3.26 


3.84 


3.37 


R10000.60 


10000 


299933 


2.53 


2.84 


2.55 


3.00 


2.75 


R10000.80 


10000 


399994 


2.11 


2.38 


2.15 


2.52 


2.28 


RIOOOO.IOO 


10000 


499882 


1.85 


2.08 


1.87 


2.15 


2.00 



Table 5. Performances of the five algorithms on weighted grid like graphs with 
local improvements applied. The values denote the difference from the optimum 
in % 



graph 


n 


m 


Greedy 


MM 


LAM 


PGA 


PGA' 


GIO 


10000 


18990 


2.02 


5.58 


2.02 


4.86 


1.79 


G20 


20000 


38980 


2.11 


6.06 


2.11 


4.94 


1.87 


G40 


40000 


78960 


2.11 


6.31 


2.11 


5.01 


1.91 


G60 


60000 


118940 


2.12 


6.47 


2.12 


5.01 


1.94 


G80 


80000 


158920 


2.11 


6.57 


2.11 


5.05 


1.94 


GlOO 


100000 


198900 


2.16 


6.56 


2.16 


5.05 


1.93 


a 


20000 


49984 


2.52 


6.00 


2.49 


6.42 


3.00 


b 


19683 


56862 


2.81 


6.43 


2.61 


7.31 


3.48 


c 


19600 


53340 


2.99 


6.01 


2.62 


7.54 


3.77 



Table 6. Performances of the five algorithms on weighted complete graphs with 
local improvements applied. The values denote the difference from the optimum 
in % 



graph 


n 


m 


Greedy 


MM 


LAM 


PGA 


PGA' 


K200 


200 


19900 


0.81 


0.98 


0.78 


0.92 


0.94 


K600 


600 


179700 


0.38 


0.48 


0.35 


0.45 


0.42 


KIOOO 


1000 


499500 


0.23 


0.32 


0.26 


0.31 


0.31 


K1400 


1400 


979300 


0.19 


0.22 


0.19 


0.21 


0.20 


K2000 


2000 


1999000 


0.14 


0.17 


0.15 


0.18 


0.18 



between 1.5 and 3. The performances of the Greedy algorithm and the LAM and 
PGA' algorithms are very similar after performing the local improvements. 

We also tested how much improvement can be achieved by applying the local 
improvement operations as long as they are possible. Usually computing three 
rounds of maximal sets of these local improvements led to a matching satisfying 
the conditions of Theorem G] In some cases up to 5 iterations were necessary. 
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Using this approach the deviation from an optimum solution can be reduced by 
additional 30% on average. The runtime increases by a factor of 2 to 3. 

5.2 Runtimes 

We compared the runtimes of all five algorithms on all instances against each 
other and also compared it to LEDA’s exact algorithm. Table [ 7 | shows the rel- 
ative runtimes of the algorithms LAM, PGA and PGA' compared to the MM 
algorithm which is clearly the fastest. As can be seen there is no big difference 
between the runtimes of algorith PGA and PGA'. The PGA' algorithm is about 
a factor 2, the LAM algorithm about a factor 3 slower than the MM algorithm. 
The factor by which the local improvement operations are slower than the MM 
algorithm is about 4. This means that the LAM and PGA' algorithm with local 
improvements applied are about a factor of 7 and 6 slower than MM. 

Table 0 shows the relative runtimes of the Greedy algorithm and LEDA’s 
exact algorithm compared to the MM algorithm. The Greedy algorithm has 
a worst case runtime of O(mlogm) . The algorithm implemented in LEDA has 
a runtime of 0(mn log n) . Therefore one should expect that the Greedy algorithm 
is within a factor of logm and LEDA’s algorithm is within a factor of nlogn of 
the runtime of the MM algorithm. This behaviour can be roughly confirmed by 
the data. 

To give an impression of the absolute running times we mention these for al- 
gorithm MM on some large graphs. It took 0.16 seconds on RIOOOO.IOO, 0.06 
seconds on GlOO and 0.62 seconds on K2000. All times were measured on 
a 1.3GHz PG. 



Table 7. Relative runtimes expressed as a range of the factor within which the 
runtime of the algorithm is slower than the MM algorithm 



MM 


LAM 


PGA 


PGA' 


local improvements 


1.00 


2.68 - 3.55 


1.58 - 2.06 


1.88 - 2.10 


3.45 - 5.12 



6 Conclusion 

We have suggested a set of seven local improvement operations for weighted 
matchings in graphs which guarantee a performance ratio of |. A maximal set 
of such operations can be found in linear time. We compared five different ap- 
proximation algorithms for the weighted matching problem. The algorithms MM, 
LAM, PGA, and PGA' have linear runtime while the Greedy algorithm has run- 
time 0(m log n). The PGA' algorithm is significantly better than PGA. The 
computation of a maximum weight matching on the paths generated by PGA 
can easily be incorporated in the generation of these paths. Therefore the PGA' 
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Table 8. Relative runtimes expressed as a factor by which the runtime of the 
Greedy algorithm and LEDA’s exact algorithm is slower than the MM algorithm 



graph 


n 


m 


Greedy 


MM 


LEDA 


graph 


n 


m 


Greedy 


MM 


LEDA 


a 


20000 


49984 


7.28 


1.0 


26119 


TJTo 


10000 


18990 


6.64 


1.0 


14873 


b 


19683 


56862 


6.98 


1.0 


27181 


G20 


20000 


38980 


9.19 


1.0 


27667 


c 


19600 


53340 


5.21 


1.0 


28944 


G40 


40000 


78960 


11.64 


1.0 


53737 


R10000.5 


10000 


25009 


5.71 


1.0 


11156 


G60 


60000 


118940 


13.11 


1.0 


79321 


RIOOOO.IO 


10000 


49960 


8.15 


1.0 


8255 


G80 


80000 


158920 


14.10 


1.0 


104473 


R10000.20 


10000 


100046 


10.55 


1.0 


6004 


GlOO 


100000 


198900 


15.13 


1.0 


129857 


R10000.30 


10000 


150075 


12.11 


1.0 


5096 


K200 


200 


19900 


11.59 


1.0 


266 


R10000.40 


10000 


200011 


13.44 


1.0 


4660 


K600 


600 


179700 


20.67 


1.0 


514 


R10000.60 


10000 


299933 


15.05 


1.0 


4286 


KIOOO 


1000 


499500 


23.54 


1.0 


775 


R10000.80 


10000 


399994 


15.68 


1.0 


3949 


K1400 


1400 


979300 


22.05 


1.0 


1006 


RIOOOO.IOO 


10000 


499882 


15.99 


1.0 


3832 


K2000 


2000 


1999000 


18.65 


1.0 


1402 



algorithm requires almost no additional expense in coding or runtime and is 
definitely the better choice. 

Only in the case of complete graphs are there a few instances where the 
Greedy algorithm achieved the highest quality. But in these cases the LAM and 
PGA' algorithm are very close to these results. Therefore the higher runtime 
required by the Greedy algorithm does not justify its application. The LAM or 
PGA' algorithm is a better choice. They usually should guarantee a solution 
within 5% of the optimum. 

The local improvement operations introduced in this paper yield better per- 
formances for all five algorithms. On average the deviation from the optimum 
solution is reduced by a factor of 2. This improvement is achieved by more than 
doubling the runtimes of the linear time algorithms. Still these runtimes are dra- 
matically smaller than those required by exact algorithms. Therefore the linear 
time LAM and PGA' algorithms are definitely the best choice for applications 
where runtimes are of crucial importance. If better quality is needed our local im- 
provement operations should be applied which increase the runtime of these two 
algorithms by roughly a factor of 2-3 only. By applying our local improvement 
operations as long as they are possible the distance from an optimal solution 
can be decreased by additional 30% while the runtime grows by a factor of 4. 
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Abstract. We conduct an experimental evaluation of all major online 
graph traversal algorithms. This includes many simple natural algo- 
rithms as well as more sophisticated strategies. The observations we 
made watching the animated algorithms explore the graphs in the in- 
teractive experiments motivated us to introduce some variants of the 
original algorithms. Since the theoretical bounds for deterministic online 
algorithms are rather bad and no better bounds for randomized algo- 
rithms are known, our work helps to provide a better insight into the 
practical performance of these algorithms on various graph families. It is 
to observe that all the tested algorithm have a performance very close to 
the optimum offline algorithm in a huge family of random graphs. Only 
few very specihc lower bound examples cause bad results. 



1 Introduction 

The exploration problem is a fundamental problem in online robotics. The robot 
starts in an unknown terrain and has to explore it completely (i.e., draw a map 
of the environment). Since all computations can only be based on local (or 
partial) information, the exploration problem falls naturally into the class of 
online algorithms [4, 7]. The quality of an online algorithm is measured by the 
competitive ratio which means we compare the length of the path the robot 
travels to explore the environment to the length of the shortest path that can 
perform the same task. Of course, the former should be as short as possible to 
achieve a good competitive ratio. More formally, for any given scene S let La{S) 
be the length of the path traversed by an (online) algorithm A to explore S. 
Let Lopt{S) be the length of the path of an optimum algorithm that knows the 
scene in advance. Then we call an online exploration algorithm A c-competitive 
if for all scenes S, La{S) < c ■ Lopt{S) (see [10, 7, 4]). 

The graph exploration problem is the problem of constructing a complete 
map of an unknown strongly connected digraph using a path that is as short 
as possible. Describing the environment to be explored by the robot by a graph 

* The work described in this paper was partially supported by a grant from the Re- 
search Grants Council of the Hong Kong Special Administrative Region, China 
(Project No. HKUST6010/01E). 
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leaves aside all the geometric features of a real environment. So we can focus on 
the combinatorial aspects of the exploration problem. We usually denote a graph 
hy G = {V,E), where V is the set of n nodes and E is the set of m edges. We 
denote by R the number of edge traversals of an algorithm to determine a map 
of G, i.e., the adjacency matrix of G. 

We assume the robot can memorize all visited nodes and edges and it can 
recognize them if it reaches them again. Standing on a node, it can sense all the 
outgoing edges but it does not know where the unexplored edges are leading to. It 
cannot recognize how many edges are coming into the node. The directed edges 
can only be traversed from tail to head, not vice versa. We assume w.l.o.g. that 
traversing an edge has unit cost. Thus, the total cost of an exploration tour is 
equal to the number of traversed edges. 

The robot always has the choice of leaving the currently visited node by 
traversing an already known edge or by exploring an unvisited outgoing edge. If 
the current node does not have any unvisited outgoing edges, we say the robot 
is stuck at the current node. 

The well-known Chinese Postman Problem describes the offline version of 
the graph exploration problem [6]. Here, the postman has to find the shortest 
tour visiting all edges of a graph. For either undirected or directed graphs the 
problem can be solved in polynomial time. If the postman needs to serve a mixed 
graph that has both undirected and directed edges, the problem becomes NP- 
complete [9]. 

The online problem for undirected graphs can easily be solved by depth 
first search which visits every edge exactly twice. Since the optimal algorithm 
also needs to visit every edge at least once, the depth first search strategy is 
2-competitive. 

The problem becomes more difficult for strongly connected directed graphs. 
It was firstly investigated by Deng and Papadimitriou [5] who gave an online 
exploration algorithm with R exponential in the deficiency d of the graph (the 
deficiency is the minimum number of edges that must be added to make the 
graph Eulerian). Although it was conjectured that algorithms with R polynomial 
in d might exist no such algorithm has been found. Even worse, no randomized 
algorithm with better performance than the deterministic ones is known. There- 
fore, we study in this paper the performance of deterministic and randomized 
algorithms experimentally on several families of random graphs to get a better 
understanding of their strengths and weaknesses. Besides from simply running 
experiments we used algorithm animation as a further tool to gain deeper in- 
sight into the behavior of existing and new algorithms, hoping this would lead 
to preliminary ideas for a theoretical analysis. It turns out that all algorithms 
show a very good performance (better that 2-competitive) on all of our graph 
families, except on the family of graphs specially designed to show worst case 
behavior of the common deterministic algorithms. Experimental comparison of 
online algorithms has been done before for other problems, for example for the 
list update problem [3] and for the 2-processor scheduling problem [2]. 
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This paper is organized as follows. In Section 2, we give an overview of ex- 
isting deterministic online algorithms for the graph exploration problem. In Sec- 
tion 3 and 4, we propose new deterministic and randomized online algorithms. In 
Section 5, we describe the setup of our experiments. All source code listings are 
omitted due to space limitations but can be found together with more detailed ta- 
bles and graphics at http://www.cs.ust.hk/~trippen/graphtrav/source/. 
In Section 6, we present the results of our experiments. We discuss these results 
in Section 7, and we close with some open problems in Section 8. 



2 Background 

A graph is Eulerian if there exists a path that visits each edge precisely once. 
An Eulerian graph can be explored online with only 2m edge traversals [5] . The 
robot simply takes an unexplored edge whenever possible. If it gets stuck, it 
considers the closed walk (“cycle”) of unexplored edges just visited and retraces 
it, stopping at nodes that have unexplored edges, applying this algorithm re- 
cursively from each such node. It is easy to see that this algorithm achieves 
a competitive ratio of two in Eulerian graphs because no edge will be traversed 
more than twice. The reason for this is that the recursions will always take place 
in completely new parts of the graph. In fact, this algorithm is optimal. 

For non-Eulerian graphs, Deng and Papadimitriou [5] showed that the de- 
ficiency d of the graph is a crucial parameter of the difficulty of the problem. 
The deficiency d of G is the minimum number of edges that have to be added to 
make G Eulerian. More formally, let in(y) and out(y) denote the in-degree and 
the out-degree of the node v. The deficiency of a node v, is d{v) = out{v) — in{v) . 
The deficiency d of the graph G is d = d{v)>o ^ node with negative 

deficiency is called a sink, a node with positive deficiency is called a source. 

If an algorithm knew the d missing edges (si, G), (s 2 , G), ■ ■ • , (sd, G) to make 
the graph Eulerian, then a modified version of the Eulerian online exploration 
algorithm could be executed [1]. Whenever the original Eulerian algorithm tra- 
verses an edge (si, ti), the modified Eulerian algorithm traverses the correspond- 
ing path from Si to G. This gives an additional factor of at most d for each edge, 
so that the robot traverses each edge at most 2d -I- 2 times. 

Deng and Papadimitriou suggested to study the dependence of i? on m and d 
and showed the first upper and lower bounds. They gave a graph for which any 
algorithm, deterministic or randomized, needs at least f/(d^m/ log d) edge traver- 
sals. Koutsoupias later improved the lower bound for deterministic algorithms 
to n{dfm). 

Apart from the optimal algorithm for d = 0 which has a competitive ratio 
of two, Deng and Papadimitriou also gave an optimal algorithm for d = 1 with 
a competitive ratio of four. The generalization of this algorithm for greater d 
seems highly complicated, so instead they gave a different algorithm that ex- 
plores a graph with deficiency d using edge traversals. They posed the 

question whether the exponential gap between the upper and lower bounds can 
be bridged. They conjectured the existence of an online exploration algorithm 
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whose competitive ratio is polynomial in d for all graphs with deficiency d. No 
such algorithm has been found so far. 

Albers and Henziger [1] did a first step in giving a positive answer to that 
question. They presented the Balance Algorithm (described below) which can 
explore a graph of deficiency d with edges traversals. They also showed 

that this bound is tight for their algorithm. 

They also gave lower bounds of edge traversals for several natu- 

ral exploration algorithms as Greedy, Depth-First, and Breadth-First. For 
Generalized Greedy they gave a lower bound of 

We now give a short description of these algorithms. 

— Greedy (Gr): 

If stuck at a node y, the robot moves to the nearest known node z that has 
unexplored outgoing edges. As every edge has unit cost we are looking for 
a shortest path to an unfinished node. 

— Generalized-Greedy (G-G): 

At any time, for each path in the subgraph explored so far, define a lexico- 
graphical vector as follows. For each edge on the path, determine its current 
cost, which is the number of times the edge was traversed so far. Sort these 
costs in non-increasing order and assign this vector to the path. If stuck at 
a node y, out of all paths to nodes with unexplored outgoing edges the robot 
traverses the path whose vector is lexicographical minimum. 

— Depth-First Search Strategy (DFS): 

If stuck at a node y, the robot moves to the most recently discovered node z 
that can be reached and that has unexplored outgoing edges. 

— Breadth-First Search Strategy (BFS): 

Let V be the node where the exploration started. If stuck at a node y, the 
robot moves to the node z that has the smallest distance from v among all 
nodes with unexplored outgoing edges that can be reached from y. 

— Balance Algorithm (Bal): 

A graph with deficiency d can be made Eulerian by adding d edges (si,ti), 
. . . , {sdi td)- The Balance algorithm is too involved to give a full description 
here, see [1] for details. Basically, it tries to find the above mentioned missing 
edges by maintaining d edge-disjoint chains such that the end-node of chain i 
is Si and the start-node of chain i is the current guess of ti . The current guess 
of ti will be marked with a token r^. As the algorithm progresses, paths can be 
appended at the start of each chain or inserted into chains. At termination, 
the start-node of chain i is indeed ti. To mark chain i all edges on chain i 
are colored with color i. 

The algorithm consists of two phases. 

Phase 1: The robot traverses unexplored edges until getting stuck at a sink s. 
The algorithm switches to Phase 2. The cycle starting and ending at s is the 
initial chain Cq. 

Phase 2: Phase 2 consists of sub-phases. From a high-level point of view, at 
any time, the subgraph explored so far is partitioned into chains, namely Cq 
and the chains generated in Phase 2. During the actual exploration in the 
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sub-phases, the robot travels between chains. While doing so, it generates or 
extends fresh chains, which will be taken into progress later, and finishes the 
chains currently in progress. During each sub-phase the robot visits a current 
node X of a current chain C and makes progress towards finishing the nodes 
of C. The current node of the first sub-phase is s, its current chain is Cq. 
The current node and current chain of sub-phase j depend on the outcome 
of sub-phase j — 1. All the chains introduced in Phase 2 are marked with 
a token. 

A tree T is maintained by the algorithm such that each chain C corresponds 
to a node v{C) of T, and v{C') is a child of v{C) if the last sub-path appended 
to C' was explored while C was the current chain. 

While the robot always explores unvisited edges until it gets stuck, it relo- 
cates using the tree T. While the current chain is not finished it relocates to 
the next unfinished node on it. Simply spoken, the robot relocates to an un- 
finished chain C" whose subtree of the vertex u(C") has the smallest number 
of tokens. In this sense the algorithm always tries to balance out the tree. 

3 New Deterministic Algorithms 

We also studied other deterministic algorithms. We considered different varia- 
tions of Greedy and a new algorithm called List. 

— Cheapest-Edge-Greedy (E-G): 

If stuck at a node y, the robot follows a path — leading to a node that 
still has unexplored outgoing edges — that chooses the edge at each node 
that has been used least often. In case of a tie the edge with the earliest 
exploration time will be chosen. 

This algorithm is similar to Generalized Greedy. However, for this algo- 
rithm we can give a simple worst case example with deficiency 0 that makes 
the robot oscillate between two ends of the graph, so it has a rather bad 
competitive ratio. Figure 1 shows the behavior of the robot on the graph. 
We string together many cycles of length two. The robot starts in the middle 
of this long chain (node 1), and it will alternately explore one new cycle at 
the right end (node 2, 4, 6, ... ) and then at the left end (node 3, 5, 7, . . . ) 
always traversing all the cycles in between again and again. 




7 5 3 1 2 4 6 



Fig. 1. The numbers represent the time the node has been explored 
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— Path-Cost-Greedy (C-G): 

This algorithm determines the cost of a path as the sum of all edge costs 
which is the number of times the edge was traversed so far. If stuck at 
a node y, the robot will follow the cheapest path to a node that still has 
unexplored outgoing edges. 

— List Algorithm (List): 

Like Balance, List builds chains. However, the chains are stored in a list and 
every chain has an energy value associated with it. Chains will be considered 
in the order they appear in the list. If the robot gets stuck in a chain, this 
chain loses energy. If the energy of a chain drops down to zero, it will be 
moved from its current position in the list to the end with full energy again. 
A full and comprehensive description seems not possible here. No theoretical 
upper bound for this algorithm has been proven yet. 



4 Randomized Algorithms 

— Random Balance (R-B): 

Whereas the Balance algorithm relocates to a subtree which has minimum 
number of tokens. Random Balance chooses this subtree randomly. 

— Harmonic (Har): 

When getting stuck, the robot will relocate to a node that still has unexplored 
outgoing edges with a probability that is inversely proportional to its distance 
from the current robot position. 

— Random (Rand) : 

After determining the paths to nodes with unexplored edges which do not 
pass any other unfinished node, the robot chooses one of these paths uni- 
formly at random. 

5 Experimental Setup 

As we do not compare running times of the different algorithms but the number 
of edge traversals to explore a graph we do not give the machine details but 
we describe the testbeds we used. Naturally, more sophisticated algorithms need 
more computing time and the robot movements do not really take any time. So 
even when simple algorithms need more edge traversals they might finish their 
computations faster than more complicated algorithms. 

We generated a huge testbed of random graphs. All programs can be found 
at http://www.cs.ust.hk/~trippen/graphtrav/source/. After creating one 
single cycle with m nodes and m edges we randomly merged nodes until only n 
nodes were left. Merging two nodes v\ and V2 means that the newly emerging 
node V has an incoming edge from a node u if at least one of the nodes V\ or V2 
had an incoming edge from node u, and v has an outgoing edge to a node w 
if at least one of the nodes v\ or V2 had an outgoing edge to node w. This 
created a “random” graph of deficiency zero. To obtain a graph of deficiency 
d > 0, we added d edges ensuring that every new edge really increased the 
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current deficiency by one (so the graphs have actually m + d edges). We used 
four different methods (named VltoVAin the following) to add these d edges to 
obtain families of graphs with different structural properties where the various 
algorithms might show different behavior (as it turned out, their behavior did 
not differ). We either introduced the d additional edges randomly (FI), or we 
restricted them to starting at one common source (V2) or ending at one common 
sink (F3), or we chose one common source and one common sink (F4) for all 
these edges. 

Although our algorithm to create the test graphs does not really create graphs 
according to a uniform distribution on all graphs with a certain deficiency (we 
do not know how to do that), our graphs do not seem to have hidden structural 
properties that could be unfairly exploited by any of the online algorithms. 

We considered graphs with a deficiency from 0 up to 30. In our first series 
of experiments we considered graphs with n = 30 nodes. We generated graphs 
of different sizes with m = 60, m = 100, m = 200, and m = 400 edges. In the 
second series of experiments we considered graphs with n = 100 nodes and with 
m = 500, m = 2, 000, and m = 5, 000 edges. 

For each choice of parameters we generated 100 random graphs which consist 
of the graphs with n = 30 nodes, and since the exploration of the big graphs 
with n = 500 and m = 5,000 is too time consuming we only generated ten 
random graphs for this family. While graphs of our first series were explored 
very quickly our second series took much more time. A single algorithm usually 
took a few minutes for graphs with m = 500 edges. On a graph with m = 2, 000 
edges it needed about 15 minutes. For the large graphs with m = 5,000 edges, 
the algorithms usually traversed only one edge per minute, on the average. 

For the smaller graphs in the first series, we started each algorithm on every 
node of the graph. This way, each graph gave rise to 30 different experiments for 
each algorithm. Since we used 100 random graphs, every point in the diagram 
(for a particular parameter setting) for each algorithm is the average value of 
3,000 runs. 

Due to the long running times of the exploration algorithms on graphs of 
our second series we started every algorithm only once on each graphs. Thus, for 
these graphs every point in the diagram (for a particular parameter setting) for 
each algorithm is only the average value of 100 runs (or ten runs for the large 
graphs) . 

We also generated the families of lower bound graphs for the old deterministic 
algorithms to show their bad performance on these examples and to test how 
our new algorithms would perform on these graphs. 

Additionally to our own random graphs, we used random graphs obtained 
from the Internet published by the University “Roma Tre” , Italy 
http : //www . dia.uniromaS . it/~gdt/ editablePages/test .htm. 

These graphs have size ranging from 10 up to 100 nodes. Those sparse graphs 
were biconnected and directed. We only added edges to make them strongly 
connected. 
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The results of the experiments on these graphs will not be listed in Section 6 
but they were quite similar compared to our random graphs with similar numbers 
of nodes, edges, and deficiencies. 

6 Results 

Table 1 and Fig. 2 show the performance of some old exploration algorithms - 
different variations of them will be discussed later - and our new List algorithm 
on the lower bound example for the Greedy algorithm [1]. Indeed, the same 
graphs can also be used as lower bound graphs for DFS and BFS. For the size of 
these graphs see Table 1. All algorithms clearly show a competitive ratio growing 
superpolynomially in the deficiency d (the y-axis is in logscale!). In consequence 
of the exponential growth of the size of these graphs, Fig. 2 only shows a smaller 
range of the x-axis compared to the following figures. 

Table 2 and Fig. 3 compare the performance of all algorithms on three graphs 
of the family for the lower bound example for the Balance algorithm [1]. Because 
of the special recursive structure of this lower bound example there are only 
graphs with deficiency 2, 7, and 22. These three graphs have sizes m = 16 and 
n = 14, m = 76 and n = 64, and m = 886 and n = 739, respectively. 

Table 3 and Fig. 4 show the results of experiments over 100 different ran- 
dom graphs with 100 nodes and 500 + d edges, where the d edges were added 
randomly (1^1). Reflecting the fact that all natural exploration algorithms and 
their variations behave very similarly to each other, the curves in the dia- 
gram would be mostly overlapping. This is why we only show a few curves 
in the diagrams. Larger graphics with finer granularity can be downloaded at 



Table 1. Number of edge traversals on the lower bound graph for Greedy, DFS, 
and BFS [1] 



d 


m 


n 


OPT 


Greedy 


DFS 


BFS 


Balance 


List 


1 


13 


10 


15 


26 


26 


26 


26 


26 


2 


31 


24 


43 


94 


94 


94 


86 


86 


3 


71 


56 


124 


329 


329 


329 


330 


258 


4 


159 


128 


316 


1182 


1182 


1182 


908 


688 


5 


351 


288 


785 


4002 


4082 


4002 


2909 


2102 


6 


767 


649 


1858 


12808 


13354 


12808 


8211 


4635 


7 


1663 


1408 


4276 


37997 


41362 


37997 


20156 


13479 



Table 2. Number of edge traversals on the lower bound graph for Balance [1] 



d 


OPT 


Gr 


G-G 


C-G 


E-G 


DFS 


BFS 


List 


Bal 


R-B 


Har 


Rand 


2 


33 


39 


39 


39 


39 


39 


39 


39 


63 


63 


39 


39 


7 


370 


474 


475 


475 


475 


518 


474 


480 


752 


968 


430 


474 


22 


11597 


38828 


45344 


38360 


45327 


42175 


48333 


29235 


28608 


97779 


39758 


40284 
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http : //www . cs .ust ,hk/~trippen/graphtrav/source/. 

Due to space limitations we cannot include the results of experiments with all 
other graph classes (28 different sizes, and different types of adding the d edges, 
V1-V4) in this extended abstract, but the algorithms behaved very similar on 
all the other graph classes. In Table 4 and Fig. 5, we show in comparison the 
results for random graphs with 30 nodes and 400 + d edges, VA. 

7 Evaluation of the Results 

The large testbed of random graphs shows that actually in most of the cases the 
performance of the algorithm is quite close (better that 2-competitive) to the 
optimum. However, for all the natural deterministic algorithms presented here 
it is possible to give a lower bound example where only exp(d)-competitiveness 
can be achieved. But we cannot give a general lower bound that is greater than 
f2{d^) for any online algorithm. 

Albers and Henzinger [1] mentioned that with respect to the current knowl- 
edge of the s-t connectivity problem it seems unlikely that one can prove super- 
polynomial lower bounds for a general class of graph exploration algorithms. 



Edge Traversals on Lower Bound Graph for Greedy, DFS, and BPS 




Deficiency d 



Fig. 2. Number of edge traversals on the lower bound graph for Greedy, DFS, 
and BFS [1]. See Table 1 for sizes of the graphs. DFS and BFS behave very similar 
to Greedy and therefore they are omitted here 
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Edge Traversals on Lower Bound Graph for Balance Algorithm 




Fig. 3. Number of edge traversals on the lower bound graph for Balance [1]. 
The three graphs have sizes m = 16 and n = 14, m = 76 and n = 64, and 
m = 886 and n = 739, respectively. Here we only show three curves because the 
curves of all other algorithms are overlapped by the curve of the List algorithm 



Indeed, every deterministic algorithm usually focuses on a specific part of the 
graph. This behavior is well known to the adversary, and he can exploit it to let 
the algorithm run large detours. 

Greedy is always sticking to its nearby environment and it never considers 
changing to a different part of the graph unless the current local area is com- 
pletely explored. 

DFS has high relocation costs due to the fact that it always relocates to the 
“deepest” node where it could continue. 

BFS changes to different parts of the graph when getting stuck but these 
parts always have about the same distance from the initial start node of the 
exploration. In doing so it might swap too often between different areas of the 
graph which implies high relocation costs. 

The Balance algorithm might be forced to move tokens inside “small” areas. 
It then neglects other parts of the graph for too long time. 

All the lower bound graphs for individual algorithms have one feature in com- 
mon. They exploit the weakness of the specific algorithm and lead it again and 
again through the same bottleneck node. There are usually subproblems con- 
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Table 3. Average number of edge traversals on random graphs with 100 vertices 
and 500 + d edges (FI) 



d 


OPT 


Gr 


G-G 


C-G 


E-G 


DFS 


BFS 


List 


Bal 


R-B 


Har 


Rand 


3 


510 


521 


522 


521 


578 


539 


533 


600 


616 


616 


530 


533 


6 


519 


532 


532 


532 


610 


561 


552 


629 


663 


663 


546 


552 


9 


528 


542 


542 


542 


638 


584 


572 


652 


705 


706 


560 


568 


12 


535 


553 


553 


552 


664 


600 


592 


686 


762 


763 


578 


584 


15 


541 


561 


564 


562 


689 


619 


608 


727 


823 


825 


590 


598 


18 


549 


573 


571 


569 


715 


631 


622 


746 


882 


886 


602 


611 


21 


557 


577 


584 


579 


743 


647 


641 


775 


897 


899 


610 


620 


24 


563 


587 


592 


589 


746 


677 


661 


787 


904 


908 


629 


642 


27 


569 


593 


602 


595 


757 


691 


674 


816 


976 


980 


638 


651 


30 


575 


600 


614 


605 


789 


708 


690 


837 


1031 


1033 


652 


669 



Table 4. Average number of edge traversals on random graphs with 30 vertices 
and 400 + d edges (F4) 



d 


OPT 


Gr 


G-G 


C-G 


E-G 


DFS 


BFS 


List 


Bal 


R-B 


Har 


Rand 


3 


408 


414 


415 


415 


415 


418 


416 


424 


433 


433 


424 


418 


6 


414 


425 


427 


426 


428 


430 


426 


448 


466 


466 


452 


432 


9 


424 


438 


457 


439 


441 


445 


441 


464 


492 


492 


468 


443 


12 


431 


441 


466 


448 


478 


452 


449 


467 


499 


499 


448 


452 


15 


444 


460 


511 


461 


503 


466 


466 


480 


524 


524 


463 


466 


18 


452 


467 


627 


472 


524 


475 


473 


492 


546 


546 


472 


473 


21 


459 


478 


686 


483 


553 


489 


488 


490 


549 


549 


484 


486 


24 


465 


481 


779 


494 


576 


497 


487 


504 


569 


569 


490 


490 


27 


468 


490 


810 


501 


586 


503 


490 


519 


586 


586 


497 


497 


30 


478 


494 


917 


517 


629 


521 


505 


510 


588 


588 


506 


511 



structed in a recursive manner forcing the algorithm to traverse the bottleneck 
a super-polynomial(d) number of times. 

Because of the simple structure of the “natural” exploration algorithms, 
Greedy and its variations, DFS, and BFS, it is easy to give exponential lower 
bound examples. Only the more sophisticated Balance algorithm by Al- 
bers and Henzinger [1] achieves a sub-exponential number of edge traversals, 
namely 

The List algorithm uses the similar basic concept as Balance. But it con- 
siders the chains in a different order. List will not move the tokens of the same 
group of chains for a long time within the same part of the graph if this always 
implies relocations via the sink. Instead, it uses an energy value associated with 
each chain to preempt the further exploration on some chains to change to some 
other chains that have not caused so many relocations through their sinks. 

For randomized algorithms, it seems harder to construct lower bound ex- 
amples. The adversary cannot “hide” some part of the graph as easily as he 
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Average Edge Traversals 




Fig. 4. Average number of edge traversals on random graphs with 100 vertices 
and 500 + d edges {VI). Omitted curves are mostly overlapped by the curve of 
the Greedy algorithm. Random Balance behaves very similar to Balance 



can do when he knows where the algorithm will continue its exploration after 
getting stuck. The adversary can only “determine” which of the unexplored out- 
going edges the robot will choose after relocating to a vertex with unexplored 
outgoing edges. 

Despite their bad worst-case performance bounds, all natural online explo- 
ration algorithms showed near-optimal performance on our set of random graphs. 
Apparently, the randomly distributed edges in the graphs make relocation paths 
for these algorithms very short. The simplest algorithms. Greedy (Gr) and Rand, 
found the shortest tours. Generalized Greedy (G-G) and Path-Cost Greedy 
(C-G) are very similar to Greedy. This is implied by the fact that edges leading 
to nodes with unexplored outgoing edges are traversed not very often — so the 
path-vectors of paths leading to such nodes are very small, thus Generalized 
Greedy would prefer such a path — , and that a path with edges that are not tra- 
versed very often has low costs, and in general short paths often have lower costs 
than longer paths — so Path-Cost Greedy would also choose the same path as 
Greedy. However, in a very dense graph especially of variant V4 Generalized 
Greedy rather follows a “detour” on rarely traversed edges than a short path of 
often used edges. This results in a large number of total edge traversals which 
can be seen in Fig. 5. 
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Average Edge Traversals 




10 



15 

Deficiency d 



20 



25 



30 



Fig. 5. Average number of edge traversals on random graphs with 30 vertices 
and 400 + d edges (V4). Omitted curves are mostly overlapped by the curve of 
the Greedy algorithm. Random Balance behaves very similar to Balance 



The algorithms that work with chains (Balance, Random Balance, and List) 
have a little disadvantage in these cases as they are forced to follow their chains 
even if there are shorter ways to relocate. However, this only leads to a small 
overhead. We note that List performs better than Balance, although we do 
not have any theoretical analysis of this phenomenon. The standard exploration 
strategies DFS and BFS perform somewhere in the middle between Greedy and 
Balance. 

On the other hand (and not surprisingly), the sophisticated algorithms do 
much better on the very special graphs that provide lower bounds for the simple 
algorithms. 



8 Conclusions and Open Problems 

In our experiments, we observe that all algorithms show a very good performance 
(better that 2-competitive) on all random graph families we tested, except on 
the family of graphs specially designed to show worst case behavior of the com- 
mon deterministic algorithms where we observe an exponential number of edge 
traversals. 
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The main open problem remains unsolved. Is there an exploration algorithm 
for strongly connected directed graphs with deficiency d that achieves an upper 
bound on the number of edge traversals that is polynomial in dl 

Randomization seems to be a promising way as our experimental results have 
shown that randomized algorithms do not show any disadvantage compared to 
deterministic ones. On the contrary, it seems to be hard to design good lower 
bound examples. Future work should include the analysis of randomized algo- 
rithms under different models of adversaries. 
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Abstract. We propose a new method for finding the minimum ratio-cut 
of a graph. Ratio-cut is NP-hard problem for which the best previously 
known algorithm gives an 0(log n)-factor approximation by solving its 
dually related maximum concurrent flow problem. We formulate the min- 
imum ratio-cut as a certain nondifferentiable optimization problem, and 
show that the global minimum of the optimization problem is equal to the 
minimum ratio-cut. Moreover, we provide strong symbolic computation 
based evidence that any strict local minimum gives an approximation by 
a factor of 2. We also give an efficient heuristic algorithm for finding a lo- 
cal minimum of the proposed optimization problem based on standard 
nondifferentiable optimization methods and evaluate its performance on 
several families of graphs. We achieve 0(n^'®) experimentally obtained 
running time on these graphs. 



1 Introduction 

Balanced cuts of a graph are hard computation problems important both in 
theory and practice. Ratio-cut is the most fundamental one since most of the 
others including minimum quotient cut, minimum bisection, multi-way cuts can 
be easily approximated using it [13, 20]. Also several other important approx- 
imation algorithms like crossing number and minimum cut linear arrangement 
are based on the ratio-cut [13]. Ratio-cut has many practical applications, most 
important being VLSI design, clustering and partitioning [23, 14, 1]. 

Since ratio-cut is a NP-hard problem [15] we must seek for approximation 
algorithms to solve it in practically reasonable time. Many purely heuristic algo- 
rithms were developed [23, 25, 21, 8] most of them relying on simulated anneal- 
ing, spectral methods or iterative movement of nodes from one side of the parti- 
tion to the other. A common idea exploited by several authors [25, 2, 9, 21, 22] to 
improve their quality is using multi-scale graph representation usually obtained 
by edge contraction. At first a partition at the coarsest scale is obtained and 
then refined to a more detailed one by one of mentioned algorithms. Although 
these algorithms may perform well in practice no optimality bounds are known 
for them. 
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The best previously known algorithm with proven optimality bounds finds an 
0(log n)-factor approximation, where n is the number of nodes in the graph. It is 
based on reduction of the ratio-cut problem to the multi-commodity flow prob- 
lem, which can be solved with polynomial time linear programming methods. 
Unfortunately this method is not practical since the resulting linear program is 
of quadratic size of the number of nodes in the graph and cannot be solved effi- 
ciently. Then, approximation algorithms [16, 17, 12, 24] were discovered for the 
multi-commodity flow problem itself making this approach usable in practice. 
Several heuristic implementations [16, 11, 24] are based on this idea, some of 
them quite effective and practical. The most elaborate one [11] can deal with up 
to 100000-node graphs in reasonable time. 

In this paper we propose a new way of finding the minimum ratio-cut of 
a graph. We construct a nondifferentiable optimization problem whose minimum 
solution equals the minimum ratio-cut value and use nonlinear programming 
methods to search for it. Since the problem is non-convex, we may find only 
a local minimum. However, we show that any strict local minimum gives us 
a factor of 2 approximate cut. For that purpose we introduce a notion of locally 
minimal ratio cut for which no subset of nodes taken from one side of the cut 
and moved to the other side decrease the cut value. We establish one-to-one 
correspondence between strictly locally minimal cuts and strict local minima of 
the proposed optimization problem. 

The reduction of a NP-hard discrete problem to a continuous one is not 
a novel idea. For example the maximum clique problem of a graph can also be 
stated as an optimization problem [6] and numerical optimization methods for 
finding the optimum may be used. However for the maximum clique problem no 
optimality bounds of a local minimum are known. To show that our method is 
practical we present an efficient heuristic algorithm for finding a local minimum 
of the proposed problem, which is based on the standard methods of nondiffer- 
entiable optimization and analyze its performance on several families of graphs. 
With the proposed method we can find a good partition of a 200000-node graphs 
in less than one hour. 

2 Problem Formulation 

We are dealing with an undirected graph G = (V,E), where V is its node 
set and E is its edge set. The nodes of the graph are identified by natural 
numbers from 1 to n. Each node i has a weight di = 0, and each edge (i,j) has 
a capacity Cy =0, satisfying the properties = Cji, ca = 0. We define Cij = 0 
when there is no edge {i,j) in G. We assume that there are at least two nodes 
with non-zero weights. {A, A') denotes a cut that separates a set of nodes A from 
its complement A' = U\A. The capacity of the cut C(A, A') is the sum of edge 
capacities between A and A' . The ratio of the cut i?(A, A') is defined as follows 

C{A,A') 

E c?* • T. 

ieA ieA' 



R{A, A') 



( 1 ) 
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We will focus on finding the minimum ratio-cut i.e. the cut (A, A') with the 
minimum ratio over all nonempty A^A! . 

Definition 1. A cut {A, A') is locally minimal if for all non-empty U <Z A, U yf 
A : R(A,A') < R(A\U,A' U C/) and for all non-empty U' C A' ,U' yf A' : 
R{A, A') < R(AUU' , A'\U'). Similarly we call a ratio-cut strictly locally minimal 
if strong inequalities hold in this definition. 



3 Ratio-Cut as an Optimization Problem 



We can assign a variable € H to each node i and consider the following 
optimization problem over x from M". 



min F{x) = Cij\xi — Xj\ 
i.jev 

subject to H{x) = didj\xi — Xj\ = 1, 

i,j^V 

y^^Xj = 0. 
iev 



(2) 

( 3 ) 

( 4 ) 



This optimization problem is equivalent to the ratio-cut problem in the sense 
described below. 



Definition 2. A characteristic vector x^ for a cut (A, A') is defined such that 
its components 



f a, * € A 
6, i ^ A 



, , where 



( 5 ) 



1 \A'\ 1 ^ 

2J2d,- J2 d, |cr 2j2d,- J2 d, |pr 

ieA ieA' ieA i&A' 



It is straightforward from this definition that for a cut {A, A') x^ satisfies 
the constraints (3, 4), and F(x^) = R{A,A'). 



Definition 3. For some feasible x and some p G TR we call the cut {P,P') 
positional, if P = {i\xi < p}, P' = V\P, and both P and P' are non-empty. 
The ratio of this cut Rp{x) = R{P, P'). For a fixed x we can speak of minimum 
positional cut Rmin{x) over all possible positional cuts obtained for different p: 



Rmm{x) = mini?p(x). 

pGiR 

Theorem 1. For each feasible x* of (2, 3, 4) F{x*) > Also F{x*) > 

Rmin(x*) if there are at least two positional cuts with different values. 
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Proof. Here we only sketch the main steps of the proof, for details omitted due to 
space limitation refer to [7]. Let us partition all nodes into three sets Ui, U2, U3 
as follows. 

y* = mma::*, = min x*, 

i i\x*>yi 

Ui = {i\x* = y*}, U2 = {i\x* = y^}, C/3 = {*|< > 

If t /3 is empty there is exactly one positional cut, x* is in the form of char- 
acteristic vector and F{x*) = Rmin{x*) what concludes this proof, else define 

Va = min x*. 

MX*>V2 

We create a sub-problem of (2, 3, 4) by reducing the original one to a new 
variable y = (j/i, 2/2, 2/3)- Next, we further restrict it with yi < y 2 < 2/3 can 
drop the absolute value signs getting: 

minF 2 ( 2 /) = 2 ^ £^( 2/2 - 2/i) + ^ Cij{ya + Ij - yi)+ 

ieUi,jeU2 ieUijeUs 

Cijiya + lj -y2) + K, (6) 

ieU2,jeU3 



subject to H 2 {y) = 2 Y didj{y 2 - 2 /i) + Y didj{ya + Ij - yi)+ 
ieUi,jeU2 ieUijeUs 

Y didj{ya + Ij - y2) + P, (7) 

ieU2,jeU3 

IC/1I2/1 + IC/2I2/2 + IC/2I2/2 = 0 , ( 8 ) 

2/1 < 2/2 < 2 / 3 , (9) 

where P and K are appropriately calculated constants. 

We have obtained a locally equivalent linear program in the sense that for y* 
the constraints (7, 8 , 9) are satisfied and F{x*) = F 2 {y*). Also, if we can find 
a better solution for (6 — 9) we can substitute the result back to the original 
problem giving a better feasible solution for it. From the linear programming 
theory it is known that we can find the optimal solution by examining the 
vertices of the polytope defined by the constraints - in our case that means 
when one of the inequalities in (9) is satisfied as equality. Let us examine both 
cases. 

In case yi = 2/2 after some calculations we get 

F 2 {yuyi,ya) = (1 - L)i?(C/i U t/ 2 ,C/ 3 ) + s ( 10 ) 

for appropriate constants L and B. And similarly in case 2/2 = 2/3 



F2{yi,y2,y2) = (1 - L)i?(C/i, t /2 U C/ 3 ) + B. 



( 11 ) 
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We chose the solution y with yi = j /2 if R{Ui U U 2 ^Uz) < R{Ui,U 2 U f/ 3 ), 
otherwise choose the solution with 2/2 = 2 / 3 - It is evident that if both these ratio 
costs are non-equal we get a strictly smaller function value. We substitute the 
solution back into the original problem obtaining a new x. x\s & feasible solution 
of (2, 3, 4) with a smaller or equal function value and the set U 2 merged to U\ 
or U^. We repeat the described process until f /3 is empty. Let us analyze the 
resulting x and sets f7i and f/ 2 . We have F{x) = R{Ui,U 2 ), where (f 7 i,t/ 2 ) is 
some positional cut of x* (in fact the minimal one), hence F{x*) = F{x) = 
i?jj^in(x*). If we had some non-equal cuts compared during the process, we have 
a strict decrease in the function and hence the second statement of the theorem 
holds. □ 

Definition 4. A feasible x* is a local minimum of (2, 3, 4) if there exists £ > 0 
such that F{x*) < F{x) for each x satisfying (3, 4) ||x — x*|| < er. 

Definition 5. A feasible x* is a strict local minimum of (2, 3, 4) if there exists 
e > 0 such that F{x*) < F(x) for each x yf x* satisfying (3, 4) o,nd ||x — x*|| < £. 

Theorem 2. Each strict local minimum x* of (2, 3, 4) is in the form x* = x^ 
for some cut {A, A'). 

Proof. We need to prove that in a strict local minimum the expression (5) holds 
for some a and b, the correct values for a and b are guaranteed by the constraints 
(3) and (4). Assume to the contrary that there are more than two distinct values 
for the components of x*. We form the reduced problem like in Theorem 1 
obtaining equations (6 — 9). We are able to do that since U 3 is non-empty due to 
our assumption. From the linear programming theory a strict local minimum can 
only be on the vertex of the polytope defined by the constraints (7 — 9) , however 
in our case y* cannot be on the vertex since the constraints where equality holds 
at y* are less then the number of variables. Consequently, x* cannot be a strict 
local minimum for the original problem. Hence our assumption is false and the 
theorem is proven. □ 

There is one-to-one correspondence between strictly locally minimal cuts and 
strict local minimums of (2, 3, 4) as stated in the following theorem. 

Theorem 3. x is a strict local minimum of (2, 3, 4) if o.n-d only if x is a char- 
acteristic vector of some strictly locally minimal cut (A, A'). 

The proof is rather technical and is omitted due to space constraints. See [7] for 
details. 

We shall note that also for each non-strictly minimal ratio cut its character- 
istic vector x^ gives a local minimum of the function, however the converse is 
not true. There exist non-strict local minima of (2, 3, 4) with the function value 
not equal to any locally minimal cut value. 

Theorem 4. The minimum ratio-cut R is equal to the optimum solution of (2, 

3, 4)- 
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Proof. From Theorem 1 F{x) > Rmin{x) > R. For the characteristic vector 
of the minimum ratio cut F{x^) = R. The claim follows immediately. □ 

Theorem 5. Each locally minimal cut {A, A') is not grater than two times the 
minimum ratio cut. 

Proof. Let us denote the optimal cut {B, B'). We form four sets An B, An B' , 
A' n B, A’ n B' shown in Fig. 1. From Definition 1 the ratio of each of the 
cuts Ci= {AnB,V ~ AnB),C2 = {AnB',V - An B'), C3 = {A' nB',V - 
A' n B'), C4 = (A' n B, V ~ A' n B) is at least as large as ratio of C12 = {A, 
A') and C23 = {B, B'). We form a full graph by taking each of the sets An B, 
A n B', A' n B, A' n B' as the nodes of the graph and assign edge capacities as 
the sum of all edge capacities in the original graph between the corresponding 
sets. We obtain 



„ _ Cl -|- C4 -|- ce „ _ Cl -|- C2 -l- C5 

di{d,2 -l- da -l- ^ 4 ) ^ 2(^1 -l- da -l- d4) 

„ C2 -l- ca -l- cg ca -l- C 4 -l- C5 

da(di -l- d2 -|- d4) -I- d2 -I- da) 

„ C2 -h C4 -h C3 -h Ce Cl C3 C5 Cfi 

(di -|- d2)(da -|- d 4 ) ’ ^ (d2 -I- da)(di -I- d 4 ) 

Rl > Ri2,R2 > Ri2,Rs ^ d?i2,i?4 > Ri2- 

And the statement of the theorem to be proven translates to < 2. We 
verified this using symbolical computation in Mathematica 4.0. See [7] for details. 

□ 



Corollary 1. Each strict local minimum of (2, 3, 4) not grater than two 
times the minimum ratio cut. 




Fig. 1. Illustration to the proof of Theorem 5 
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U.-l') 




Fig. 2. A graph achieving the bound of Theorem 



The proof is straightforward from Theorem 3 and Theorem 5. 

The bound of Theorem 5 is tight; the graph with 4 nodes shown in Fig. 2 
achieves this bound. One can make larger examples easily by substituting any 
connected graph with sufhciently high edge capacities in place of the nodes of 
the given graph. 

4 The Algorithm 

In this section we present a heuristic algorithm based on standard nondiffer- 
entiable optimization methods for finding a local minimum of (2, 3). Finding 
a minimum of a nondifferentiable function is one of well- explored nonlinear pro- 
gramming topics [5, 18]. One of the possibilities is to approximate the nondiffer- 
entiable function with a smooth one and apply one of the well-known algorithms 
to find its local minimum [5, 3]. Often a better approach is to handle it directly. 
Indeed, in our case we obtain a very simple and fast algorithm. 

Most of the optimization theory deals with convex problems for which al- 
gorithms with proven convergence can be developed. Many of these methods 
also work for non-convex functions finding a local minimum. However, then the 
convergence cannot be shown or can be shown only in a local neighborhood of 
some local minimum what is not satisfactory in our case. The very basic algo- 
rithm of nondifferentiable optimization is a subgradient algorithm [5, 18]. We will 
adopt it for solving our problem. Since we apply it to a non-convex problem, 
it should be considered mostly as a heuristic, however practice shows that it 
actually converges to a local minimum of our problem. 

The algorithm is iterative one. The iteration of the algorithm consists of 
going in the negative direction of a subgradient of the function by a fixed step 
and then performing a projection onto the constraint (3) . The constraint (4) was 
introduced for technical reasons required in proofs and may be not considered 
in the algorithm. An appropriate subgradient q of F can be calculated with the 
following equation for its components 
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qi = ^sign(a;i -Xj), 

j&V 

Choosing the right step size is crucial for the convergence speed. Our heuristic 
observation is that it should be proportional to the node spread. We choose the 
step equal to stepFactor- (xmax — Xmin), where stepFactor is some parameter. 
Initially stepFactor = 1/14. Its update during the algorithm is be discussed 
later. The projection is performed by going in the direction of a subgradient of 
the constraint till the constraint is reached. For the constraint H we can write 
its subgradient r in a different form to allow its faster evaluation 



Xi = di'^ djsign{xi - Xj) = di \ ^ dj - ^ dj 

j&V yj:Xj<Xi j:Xj>Xi 

If we sort the Xi values and consider them in increasing order, then the 
needed sums can be updated incrementally leading to linear time evaluation 
(not counting the sorting). To perform the projection we need to calculate the 
step length towards the constraint. To simplify the calculations we will assume 
that the ordering of Xi will not change during the projection. Then the con- 
straint function H becomes linear and the desired step can be easily calculated 
from the linear equation defined as the point of value 1 on the ray defined by the 
subgradient from the current point. In the case of unit node weights our assump- 
tion is fulfilled. To see this we must observe that the step in the function will 
always lead to a point with H < 1. Then, if we have unit node weights, 
for all i and k satisfying Xi < Xk, so after the projection the distance between 
them can only increase thus keeping the same ordering. For the case of arbitrary 
node weights such algorithm gives a usable approximation of the projection step 
length. 

The whole algorithm starts with a random initialization. We assign a ran- 
dom position for each node such that H < I and then perform a projection 
to obtain a feasible starting position. We experimented also with several other 
initializations, but obtained significant improvements only for tree graphs. Since 
the optimal cut for trees can be found in linear time, we can construct a start- 
ing position that reveals it and the algorithm will only perform a few iterations 
to confirm that the solution does not improve. Hence to get a comprehensive 
picture of the algorithm behavior we decided to consider random initialization 
only. 

After the initialization we perform iterations until convergence. To tell when 
the process is converged we track the minimum positional cut values obtained 
at each iteration. The same values will also tell us how to change the step size 
parameter stepFactor. If the new cut value is lower than the previous then 
we are making progress and we should continue with the same step size. If the 
value is higher than the previous then the step size is too large. If the cut does 
not change then we have a clue that the process has converged. Of course we 
do not hurry to make decisions only from one iteration. Instead we wait for a 
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certain number of iterations controlled by the constants MAX_OSCILLATIONS and 
MAX_EQUAL before making the decision. Such delay also improves the convergence 
speed by allowing to iterate longer with a larger step size 

To determine the positional cut value at each iteration proceed as follows. 
We consider the sorted sequence of nodes, calculate the positional cut in each 
interval between two consecutive nodes and take the minimum one. We can do 
it incrementally on the sorted sequence in time 0(n + m) provided the sorting, 
where m is the number of edges in the graph. 

As suggested in one of the exercises in [5] the performance of this algorithm 
can be improved by taking the previous directions into account. We add the 
previous direction to the current reduced be some constant REDUCTION_FACTOR 
between 0 and 1. It models a heavy ball motion in the presence of a force in 
the direction of the subgradient. In our experiments such modification with 
REDUCTION_FACTOR = 0.95 performed substantially better. 

All the steps described before can be implemented to run in time 0(n + m). 
Adding the time needed for sorting the nodes one iteration takes time 0(m 
+ n log n). The number of iterations is hard to estimate so we will provide 
experimental data in the next sections. The constants MAX_0SCILLATI0NS and 
MAX_EQUAL have the most impact on the iteration count and also on the quality of 
the obtained cut. So we must select them carefully. After some experimentation 
we chose MAX_EQUAL = 200 and MAX_0SCILLATI0NS = 30. 

algorithm ratio-cut 

calculate a random feasible initial position 

acum = O.oscillationCounter = 0,equalityCounter = O.stepFactor = 1/14 
while (equalityCounter < MAX EQUAL) 

X = X + acum 
d = direction(x) 

X = X + d 

acum = acum * REDUCTION FACTOR + d 

if (minimum positional cut value has increased in this iteration) 
equalityCounter = 0 
oscillationCounter ++ 

else if (minimum positional cut value has decreased in this iteration) 
equalityCounter = 0 
else equalityCounter ++ 

if (oscillationCounter > MAX OSCILLATIONS) 
stepFactor /= 1.3 
oscillationCounter = 0 
endwhile 
end 

function direction (x) 
d = subgradient (x) 
step = (xmax - xmin) * stepFactor 
xl = X + d*step 

x2 = projection of xl on the constraint 
return x2-x 
end 
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5 The Data 

We evaluated the proposed algorithm on three families of graphs: random cubic 
graphs, random geometric graphs and random trees. We considered only graphs 
with unit weight nodes and edges. 

Random cubic graphs are potentially hard for ratio-cut algorithms since 
in [13] it was shown that there is actually a 0(log n) gap between the mini- 
mum ratio-cut and the maximum concurrent flow on these graphs. We generate 
them using the algorithm provided in [19]. 

Random geometric graphs are standard test suite for balanced cut problems 
used in several papers [11, 25]. To generate a geometric graph we place the nodes 
of the graph randomly in the unit square. Then we include an edge between each 
two nodes that are within distance S in the graph, where d is the minimum value 
such that the resulting graph is connected. 

Tree graphs are seemingly easy graphs because their optimal ratio-cut can 
be calculated in linear time. Also it is not hard to show that only one local 
minimum exists for the corresponding optimization problem. Nevertheless, it is 
an interesting family since our experiments indicate a slow convergence of our 
algorithm on these graphs. Also we can compare our result with the optimal 
one. We generate random trees with the classical algorithm where each tree is 
produced with the same probability. This algorithm produces long and skinny 
trees, which are particularly difficult for our algorithm. 

6 Experimental Results 

We implemented the algorithm in C-| — h and evaluated its performance on a com- 
puter equipped with a Pentium III 800 MHz processor and 256 Mbytes of RAM. 
For each graph family we measured the running time in seconds, the number 
of iterations and the quality of the produced cuts. Since we did not know the 
exact cut values for random and geometric graphs, we evaluated how much the 
ratio-cut value decreases when we continue the algorithm for the same number 
of iterations as performed before termination. Measuring the decrease of the 
ratio-cut value we can estimate how far the result is from the optimal. 

The algorithm was run on series of graphs of exponentially increasing size 
from 100 up to 204800 nodes. Ten graphs were generated for each size and the 
results were averaged. The average node degree for all graph families is constant. 
Although we cannot specify the degree explicitly for geometric graphs, due to 
their nature it was about 10 on all instances. The experimental results are given 
in figures 3 ~ 8. For each graph family tables show the running time, and how 
much the the the ratio-cut value improves when the iteration count is doubled. 
Let us discuss the results separately for each graph family. 

6.1 Cubic Graphs 

Although these graphs were suggested as hard, the algorithm performed very 
well on them. It took on average about 35 minutes to partition the 204800 node 
graphs. The running time dependence from the graph size is shown in Fig. 3. 
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graph size 




graph size 



Fig. 3. Running time for cubic graphs 



Fig. 4. Quality for cubic graphs 




graph size 




graph size 



Fig. 5. Running time for geometric Fig. 6. Quality for geometric graphs 
graphs 





Fig. 7. Running time for tree graphs 



Fig. 8. Quality for tree graphs 



When we approximated the running time with a function in a form O(n^) we 
get the asymptotical running time about 0(n^ ®) on these graphs. 

The algorithm seems to find a very close to optimal cut since after doubling 
the iteration count the quality increased only by less than 0.4%. Even more, the 
quality improved for the larger graphs approaching 1 (see Fig. 4). Such behavior 
is not surprising since it is not hard to prove that the ratio of a cut (%, A') of 
a random cubic graph is on average independently of the sizes of A and A! (a 
similar proof for general random graphs is presented in [23]). Then it is unlikely 
that the minimum ratio cut will be much different from this average value. 

6.2 Geometric Graphs 

The algorithm performed very well on these graphs both in terms of speed and 
quality. It took on average about 1 hour to partition the 204800 node graphs. The 
running time dependence from the graph size is shown in Fig. 5. The asymptot- 



A Nondifferentiable Optimization Approach to Ratio-Cut Partitioning 



145 




Fig. 9. The obtained cut for a 1000- Fig. 10. Optimality for tree graphs 
node geometric graph 



ical running time behavior on these graphs was about After doubling 

the iteration count the quality increased by less than 5% (see Fig. 6). Also visu- 
ally the cuts seemed the best possible. Fig. 9 shows a typical cut of a 10000-node 
graph. 

6.3 Tree Graphs 

The running time for trees was better than for other families. The largest graphs 
were partitioned in about 20 minutes. The running time dependence from the 
graph size is shown in Fig. 7. The asymptotical running time on these graphs was 
about However the quality was poor. As shown in Fig. 10 the obtained 

cuts were far from the optimal and the quality decreased with increasing graph 
size. Also doubling the iteration count showed 10% to 20% quality improvement 
(see Fig. 8). 

When we explored further the reason of the poor behavior we found out 
that convergence is much slower than for other graph families and the stopping 
criterion does not work correctly in this case. When we allowed the algorithm to 
run for a sufficiently long time, it always found the optimum solution. However 
we did not find a robust stopping criterion that correctly works with tree graphs 
and does not increase running time much for other graph families. As already 
mentioned a smarter initialization can be used to improve the quality of the 
partition if such tree or tree-like graphs are common for some application. 

7 Conclusions and Open Problems 

We have proposed a nondifferentiable optimization based method for solving 
the ratio-cut problem and presented a heuristic algorithm implementing it. We 
have shown that any strict local minimum is 2-optimal. The presented algorithm, 
however, in certain cases can find a non-strict minimum, but we can easily trans- 
form the obtained x vector into the characteristic form. Then the algorithm can 
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be run again from this starting position and this process can be iterated until 
the result does not change giving a locally minimal cut, which by Theorem 5 is 
2-optimal. 

The obtained algorithm is simple and fast and uses amount of memory that 
is proportional to the size of the graph. Its running time and quality are veri- 
fied experimentally. Its practical running time is about 0(n^ ®) on our test data. 
The algorithm produce high quality cuts on random cubic and geometric graphs. 
On trees and other very sparse graphs the quality can be significantly improved 
by choosing a better starting position than a random one. We evaluated the 
algorithm on artificially generated data. As a further work it would be impor- 
tant to evaluate its performance on real-life problems. Although the algorithm 
performed well on most graphs, anyway it is heuristic. It is an open question 
whether we can find a local minimum of (2, 3, 4) in polynomial time? 
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Abstract. The first main goal of this paper is to present Sketch-it!, 
a framework aiming to facilitate development and experimental eval- 
uation of new scheduling algorithms. It comprises many helpful data- 
structures, a graphical interface with several components and a library 
with implementations of selected scheduling algorithms. Every schedul- 
ing problem covered by the classification-scheme originally proposed by 
Graham et al. [22] can easily be integrated into the framework. 

One of the more recent enhancements of this scheme, the so called broad- 
cast scheduling problem, was chosen for an extensive case study of Sketch- 
it!, yielding very interesting experimental results that represent the sec- 
ond main contribution of this paper. In broadcast scheduling many clients 
listen to a high bandwidth channel on which a server can transmit docu- 
ments of a given set. Over time the clients request certain documents. In 
the pull-based setting each client has access to a slow bandwidth chan- 
nel whereon it notifies the server about its requests. In the push-based 
setting no such channel exists. Instead it is assumed that requests for cer- 
tain documents arrive randomly with probabilities known to the server. 
The goal in both settings is to generate broadcast schedules for these 
documents which minimize the average time a client has to wait until 
a request is answered. 

We conduct experiments with several algorithms on generated data. We 
distinguish scenarios for which a slow feedback channel is very advanta- 
geous, and others where its benefits are negligible, answering the question 
posed in the title. 



1 Introduction 

During the past 40 years scheduling problems have received a lot of research 
interest, a huge theoretical background was developed. For books on the topic 
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see for instance [13, 14, 15]. While working on scheduling problems it would often 
be convenient to have a tool with which an algorithm could be implemented and 
tested quickly. “Playing” with the algorithm can help gaining intuition of how it 
works, or on the other hand potentially speed up the finding of counter-examples 
and bad cases. Quite often it is also meaningful to get hints on the performance of 
an algorithm by having a quick glance at its empirical behavior. Then again some 
heuristics can only be evaluated by conducting such experiments. But besides 
for testing new algorithms, a tool which is able to animate the progress of an 
algorithm could also prove very helpful for presentations or in teaching topics of 
scheduling theory. 

These points stimulated the development of Sketch-it!, a framework for sim- 
ulation of scheduling algorithms. To maximize the applicability, its design was 
closely linked to the aj/Jjy-classification-scheme, originally proposed by Graham 
et al. [22]. Basically all problems covered by this scheme can be tackled with the 
help of the framework. 

In this paper we give a short overview of the framework, in order to intro- 
duce it to a broad audience. Furthermore we present experimental results in the 
broadcast scheduling domain, which were obtained with the help of Sketch-it!. 
The motivation for this is partly to demonstrate the usability of the tool, but 
mainly we believe that the results are of interest in their own. In the next section 
we motivate and define broadcast scheduling. 



Motivation and Problem Statement of Broadcast Schednling Due to the 

increasing availability of infrastructure that supports high-bandwidth broadcast 
and due to the growth of information-centric applications, broadcast scheduling 
is gaining practical importance [4] . The general setting of the broadcast schedul- 
ing problem is that (possibly many) clients request documents (e.g. web pages) 
from a server, and the server answers these requests via a high-bandwidth chan- 
nel to which all clients are connected. If several clients have requested the same 
document, a single broadcast of this document satisfies all their requests simul- 
taneously. One wants to determine a broadcast schedule that optimizes some 
objective function, usually the average response time (the time a client has to 
wait on average until her request is satisfied; in the scheduling literature, this is 
also called the average flow time). 

There are two principally different settings: on the one hand on-demand or 
pull-based broadcasts and on the other hand push-based broadcasts. 

In the pull-based setting each client has access to a low-bandwidth feedback 
channel, e.g. a modem connection, whereon it notifies the server about its re- 
quests. Two examples of such systems are @Home network [24] and DirecPC [18], 
which provide Internet access via cable television and via satellite, respectively. 

In the push-based setting no feedback channel exists. The server tries to an- 
ticipate user behavior from the previously observed popularities of individual 
documents. A classical example are Teletext systems, where the user can select 
a page on her remote control and then has to wait until it appears in the peri- 
odic broadcast. A more recent application is the SPOT technology announced 
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by Microsoft [34]. It enables special wristwatches (and in the future also other 
devices) to receive personalized information — like weather, events and personal 
messages — via a dedicated radio frequency, which is shared for all broadcasts. 
The watches contain no transmitter, i.e. it is not possible to add a feedback 
channel. Clients configure their desired contents beforehand via a Web interface. 

In this paper we present the first direct comparison of the empirical perfor- 
mance of several well known pull-based on-line algorithms with a push-based 
algorithm on the same input traces. Such a comparison may help, e.g., in decid- 
ing whether or not to integrate feedback channels into a broadcast system. 

We now give a more precise problem definition. In the following we restrict 
ourselves to the single channel case (at any point in time no more than one doc- 
ument can be broadcasted), in the literature the case of W > 1 channels is also 
considered. Furthermore we allow arbitrary-length documents and preemption, 
i.e. a broadcast of a document can be interrupted and continued at a later point 
in time. We adopt the commonly made assumption that a client can buffer the 
last portion of a document and thus can start receiving the requested document 
immediately at any point of it. A request is satisfied as soon as all “units” of 
the document were received. 

Let m be the number of documents and n be the total number of requests. 
By Zi S IN we denote the length of document i G {l...m}. Rj, where j € 
{1 . . . n}, is used both to address the j-th request and to denote its arrival time. 
Let Dj be the document requested by and T = R„ be the arrival time of the 
last request. 

The output of an algorithm in both the pull- and the push-based setting is 
a schedule S{t) G {0 . . -m}, for t G giving which document is broadcasted 
at time t, where S{t) = 0 means that no document is broadcasted. For simplicity 
we define 5'*(t) := 1, if S{t) = i, and S^{t) := 0 otherwise. Let T' be the point 
in time when all requests Rj are answered by S, w.l.o.g. S{t) = 0, for t > T' . 

Input and objective function in the two settings differ though. 

Pull-Based Here the total average response time ART is given by 
where Fj := Cj — Rj is the response/flow time of request j and Cj is its comple- 
tion time. More precisely Cj = C^^(Rj), with: C*(t) := inf{x| S^{0)d6 = k}. 

Bl\rj,pmtn\^^Fj denotes the problem of minimizing the ART in this 
setting. An on-line algorithm for this problem only knows of the requests 
with Rj < t, when deciding which document to broadcast at time t. It is called 
p-competitive, if it computes a schedule whose ART is at most p times the ART 
of an optimal solution. 

A W-speed p-competitive algorithm computes a schedule on W channels 
whose ART is at most p times the ART of an optimal solution on one channel. 

Push-Based Unlike in the pull-based setting, the algorithm does not learn the 
actual requests Rj. Instead it is assumed that an infinite sequence of requests 
is generated in a Poisson process, i.e. the request interarrival times are expo- 
nentially distributed. The algorithm knows in advance the probabilities with 
which a request Rj, j G {1 . . .n} is for document i G {1 . . .m}. It computes 
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an infinite (periodic) schedule which minimizes the expected instead of the av- 
erage response time: ERT := E(lim„^oo i Fj) = E(F), where E(F) is 
the expected flow time of any request, given the Poisson assumption and the 
request probabilities The problem of minimizing the ERT is denoted by 
Bl\rj,pmtn\E{F). Note that algorithms are analyzed in an average case fash- 
ion, the output is usually not compared to an optimal solution for a given input. 

An algorithm for this problem is called p-approximation, if it runs in poly- 
nomial time and computes a schedule whose ERT is at most p times the opti- 
mum ERT. 

For our experiments we naturally only consider a finite sequence of requests 
Rj, j € {1 . . . n}. We limit the schedule S{t) computed by a push-based algorithm 
to the interval [0,T'], where T' is the point in time when all requests Rj are 
answered by S, and as already mentioned above we assume S{t) = 0, for t > T' . 
We assume that the algorithm is fair and does not starve certain documents 
(i.e. T' is finite). 

To facilitate the comparison of the results in the different settings, we will 
also compute the ART in the push-based case. Also in both settings let MRT 
denote the maximum response time of a request. 

Organization of this paper and Contributions In the next section we give an 
overview of related work. Section 2 contains a high level description of the Sketch- 
it! framework. Finally in Section 3 we present our experimental results, which 
assert the selected push-based algorithm comparatively good performance with 
respect to ART, independently of how many documents are present. Furthermore 
the quality of the schedules do not vary depending on the document sizes. Con- 
trasting this, the MRT comparatively increases with increasing m. A nice feature 
of the push-based algorithm is that it is independent of the load of a system, 
which means it scales nicely. 



Related Work 

Simulation A project similar to Sketch-it! is being developed since 1999 at the In- 
stitute for Algebra and Geometry of the Otto-von-Guericke University in Magde- 
burg. Its name LiSA stands for Library of Scheduling Algorithms, but actually 
it is specialized in shop scheduling. It is implemented in C-| — h. For detailed 
documentation see [25]. 

The Cheddar Project is a simulation tool for real time scheduling, which was 
developed by the LIMI/EA 2215 team at the University of Brest (France) using 
the Ada programming language. It comes with a graphical editor and a library of 
classical real time scheduling algorithms and feasibility tests. For documentation 
and download of the distribution see [33] . 

Pull-Based Broadcast In [5, 6] Aksoy and Franklin introduce pull-based broad- 
cast scheduling and conduct experiments with a proposed on-line algorithm 
for the case of unit-length documents and without preemption {Bl\ri,pi = 
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Kalyanasundaram et al. [26], Erlebach and Hall [20] and Gandhi 
et al. [21] consider approximation algorithms for the corresponding off-line case. 
In [20] NP-hardness of the off-line setting is proved (also for Bl\ri,pmtn\^ XI 
with or without client buffering). 

Acharya and Muthukrishnan [4] adopt the stretch-metric from [12] to broad- 
cast scheduling and present experiments conducted with new on-line algorithms 
for arbitrary- length documents. Among other things, they show that preemption 
is very beneficial. 

Edmonds and Pruhs [19] consider the case of arbitrary-length documents, 
with preemption {Bl\ri,pmtn\^^ Fj) and without client buffering. They prove 
that no 1-speed o(n^/^)-competitive algorithm exists (this result extends to the 
case with client buffering). They furthermore present the first positive result 
concerning the on-line setting: an 0(l)-speed 0(l)-competitive algorithm. 

In [11], Bl\ri,pmtn\ maxFj is addressed. They present a PTAS (polynomial- 
time approximation scheme) for the off-line case and show that First-Come- 
First-Serve is a 2-competitive on-line algorithm. 

Push-Based Broadcast Ammar and Wong [7, 8] study the case of unit-length 
messages broadcasted on one channel {Bl\ri,pi = 1|E(F)) in the context of 
Teletext systems. For arbitrary number of channels W, this setting is also known 
as broadcast disks, treated e.g. in [1, 2]. There are several results for the variation 
of the problem where each complete broadcast of document i G {l...m} is 
additionally assigned a cost c^. The goal is to minimize the sum of the ERT 
and the average broadcast cost. Bar-Noy et al. [9] prove this is NP-hard for 
arbitrary Ci. They furthermore give a 9/8- and a 1.57-approximation, if = 0 
respectively Ci arbitrary, for i € {1 . . . m}. This is improved by Kenyon et al. [28]: 
they obtain a PTAS if the number W of channels and the costs Ci are bounded 
by constants. 

For the case of arbitrary-length documents, without preemption see for in- 
stance [23, 38, 27]. For the case of arbitrary-length documents, with preemption 
Schabanel [36] presents a 2-approximation for BW\ri,Ci,pmtn\E{F) based on 
the approach in [9] and adapts an NP-hardness proof from [27]. 

Acharya et al. [3] do an experimental study on interleaving push- and pull- 
based data broadcast for the case of unit length documents. 

2 Sketch-it! — A New Scheduling Simulation Framework 

The simulation framework was born in spring of 1999. It was developed under the 
supervision of Ernst W. Mayr, within the scope of the special research program 
SFB 342 of the German Research Foundation (DFG), part A7: “Efficient Parallel 
Algorithms and Schedules” . A distribution version will be available soon at [32] . 
Source-code documentation can be found at [31]. 

Aim The original aim of the tool was to support research in the subject of 
scheduling that was done by the Efficient Algorithms group at the computer sci- 
ence department of the TUM. Such a tool should allow easy implementation of 
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scheduling algorithms that are given by verbal description or pseudo code, and it 
should be flexible and open to all of the many different kinds of scheduling prob- 
lems, each of which has its own special parameter settings and limitations. The 
goal was to prevent the user from doing the unnecessary work that always has to 
be done for the simulation, logging and visualization overhead. This suggested 
the name Sketch-it! (which was of course also chosen because of the phonetic 
proximity to the term ‘schedule’). The researcher should be given a possibil- 
ity to experiment with his own algorithms and ideas. This involves not only 
the implementation of algorithms, but also the creation of particular problem 
instances. 

Another point of interest is the application of Sketch-it! as a teaching aid. 
This will be tested in an advanced seminar in the summer term of 2003. 



Development Tools and Libraries The simulation framework is being devel- 
oped in C-I-+, using the standard compiler g-l-+, provided by the GNU project. 
Since the complexity of the considered algorithms is of prime importance, we 
utilize the Library of Efficient Data types and Algorithms (LEDA) [30]. The 
visualization part of the application is implemented with Qt [35], a portable 
C-I-+ development toolkit that mainly provides functionality for graphical user 
interfaces (GUI). 



Simulation Core The core of the simulator comprises G-I-+ classes for the 
different job types (single machine jobs, parallel jobs, malleable jobs, shop jobs, 
broadcast requests), classes for the environments (single machine, identical / 
uniform / unrelated parallel machines, flow / job / open shop), classes for the 
network topologies (set, line, mesh, hypercube) and classes to represent prece- 
dence constraints. 

For a specific instance of a scheduling problem its jobs, machines and po- 
tentially precedence constraints are combined by a central class to form the 
task-system. 

In the following section we describe briefly how the user can generate such a 
task-system and how a newly implemented or already existing algorithm can be 
executed with this task-system as input. 



Usage 

Problem Instance Generator To support testing and empirical analysis of algo- 
rithms instances can be automatically generated, one only needs to provide the 
parameters (environment, number of machines, number of jobs, probability dis- 
tribution and expectation / variance values for stochastic problems, etc.). The 
tool then creates an instance of the requested problem and makes the appropriate 
random decisions (where necessary). 
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Simulation After selecting an appropriate scheduling algorithm (see below for 
a list of provided algorithms and how to implement a new one) the user can fol- 
low its execution either step-by-step or continuously at a certain speed. A Gantt 
chart which is constantly updated, shows the temporal assignment of jobs to 
machines. If precedence constraints are present, these can be displayed. Hereby 
jobs are highlighted according to their present state (i.e. released, available with 
respect to precedence constraints, running, finished). Other views can show the 
status of the machines (either in list form or in a special way, e.g. for a mesh 
topology) . The main window also contains textual comments concerning the ex- 
ecution of the algorithm, e.g. about its current phase or about certain operations 
it is performing. A screenshot of Sketch-it!: 




Logging Primarily to provide a standardized way to add new types of objective 
functions and also to ease the collection of relevant data during the execution 
of an algorithm, some logging capabilities are integrated in the form of so called 
loggers. These loggers are triggered by a carefully subdivided event system, which 
enables them to be notified in case selected events occur (like release of a new 
job, allocation of an available job, preemption of a running job, completion of 
a job, etc.). All standard objective functions (e.g. makespan, sum of completion 
times, sum of flow times, etc.) are already implemented with the help of such 
a logger. 

Algorithms In order to implement a new scheduling algorithm, code needs to 
be inserted in dedicated functions for the “startup” and the “inner loop”. The 
“startup” function is meant for initialization purposes and is called before the 
actual execution. The “inner loop” function is called by Sketch-it! during the 
execution in an on-line fashion, every time a new job/request is available/released 
or a job/request was completed. Note that in case the algorithm needs to make 
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decisions at other points in time, it can return a delay at the end of the function 
to notify Sketch-it! when it should to be called again. Off-line algorithms have 
complete access to the task-system and can compute the schedule already in the 
“startup” function. 

To give a feeling of the spectrum of possible problems which can be attacked 
with Sketch-it! an incomplete list of algorithms implemented to date follows: 
Smith’s ratio rule for single machine scheduling, the algorithms of Coffman and 
Graham, the parallel machine algorithm of McNaughton for scheduling with 
preemptions, genetic algorithms for scheduling in an open shop environment, 
an algorithm of Ludwig and Tiwari for scheduling malleable parallel tasks, al- 
gorithms for stochastic scheduling (LEFT, SEPT, Smith weighted SEPT) and 
algorithms for the broadcast scheduling problems considered here. For descrip- 
tions of the multiprocessor task algorithms and of the algorithms for stochastic 
scheduling problems see [37] and [29]. 

3 Comparing Push- and Pull-Based Data-Broadcasting 

In this section we present experiments which were conducted in order to compare 
the performance of push- to pull-based algorithms, with respect to the average 
response time (ART) and the maximum response time (MRT). Hereby the ART 
is emphasized, because it is the more widely accepted metric in the literature, 
and because a “good behavior” on the average seems more desirable. We compare 
several on-line algorithms for the pull-based setting to a push-based algorithm 
which is proven to be a 2-approximation for the expected response time (ERT). 
Note that the latter only gets the document probabilities tt^ as input. 

In the experiments document sizes h for i € { I . . . m} and requests Rj , Dj 
for j G {I...n}, are generated at random. The ART/MRT of the computed 
schedules can vary considerably from one of these instances to the other. In or- 
der to obtain experimental data which can be compared easily for the different 
runs, simple lower bounds for the ART/MRT are calculated for each instance 
and the ART/MRT of the individual schedules are normalized by dividing with 
the respective lower bound. These lower bounds are described in the next sec- 
tion. In Section 3.2 we present the scheduling algorithms which were chosen to 
compete against each other. Then, in Section 3.3 we go into the details of how 
the instances are generated and finally discuss the results. 

3.1 Simple Lower Bounds 

The maximum of all message lengths is clearly a lower bound for the MRT. Let 
LM := max {li \ i G {1 . . . m}, Dj = i for some j G {1 . . . n}} denote this bound. 

Deriving a lower bound LA for the ART is more interesting. First note that 
each request Rj, j G {1 . . .n} obviously has to wait at least l^j units of time 
and thus LA > i ■ 

To strengthen this trivial lower bound we consider points in time for which 
we know that at least one document is being broadcasted and others have to 
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wait. To this end we define Wi(t) := 1, if there is a j € {1 . . . n} with Dj = i 
and Rj G {t — h,t], and Wi{t) := 0 otherwise, where t € [0,Tm] with Tm '■= 
T + max{Zi . . Arn]- In other words if Wi{t) = 1, there surely is at least one 
pending request for document i at time t. At any point in time t G [0,Tm] there 
are pending requests for at least k(t) := Wi(t) individual documents. Only 

one of these documents can be broadcasted at t and the > max{fc(t) — 1,0} 
requests for other documents must wait. This gives our lower bound: LA := 
^ EJ=i i fo"' max{fc(t) - 1, 0}dt. 

3.2 Implemented Algorithms 

Pull-Based The first two are standard greedy on-line algorithms, known from 
traditional scheduling problems. The third was introduced in [4] and the fourth 
in [19]. 

• SSTF Shortest Service Time First: Like the name suggests, as time advances, 
always broadcast the document which has a pending request that can be satisfied 
most quickly among all pending requests. Note that a broadcast of a long docu- 
ment might be preempted if a request for a short document arrives. This on-line 
algorithm is optimal for the equivalent standard (unicast) scheduling problem 
l\rj,pmtn\^ '^Fj, where it is known as shortest remaining processing time. 

To show how easy it is to im- 
plement a scheduling algorithm 
with Sketch-it!, to the right we 
exemplarily present the actual 
source code inserted in the “in- 
ner loop” (cf. Section 2) for 
SSTF. 

Experiments with SSTF in [4] 
imply good performance with 
respect to ART and bad perfor- 
mance with respect to MRT. 

• LWF Longest Wait First: As time advances, always broadcast the document 
with the currently maximum total waiting time, i.e. the maximum total time 
that all pending requests for any document are waiting. In [26] it was conjectured 
that LWF is an 0(l)-speed 0(l)-competitive algorithm. Empirically it performs 
comparatively well with respect to MRT and not so well with respect to ART [4] . 

• LTSF Longest Total Stretch First: The current stretch of a request Rj at time 
t G [0,T'] is the ratio of the time the request has been in the system so far 
t — Rj (if it is still pending) respectively its response time Fj (if it is completely 
serviced) to the length of the requested document Id.. As time advances, LTSF 
always chooses to broadcast the document which has the largest total current 
stretch, considering all pending requests. It performs well empirically for ART; 
performance with respect to MRT varies [4] . 

• BEQUI-EDF Equi-partition-Earliest Deadline First: We only give a brief de- 
scription of the algorithm, for details we refer to [19]. BEQUI-EDF runs in two 



list<SBroadcastRequ*> a = 
get Acti veRequ ( ) ; 

SBroadcastRequ *r, *p = NULL; 
double s = MAXDOUBLE; 
forall (r, a) 

if ( s > r— !-getRemProcTime() ) { 
s = r— >getRemProcTime(); 
p = r;} 

if ( p != NULL ) 

broadcastMsg( p^getMsgIndex() ); 
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stages, in the first stage Equi-partition is simulated: the broadcast bandwidth 
is distributed among the documents which have pending requests, where each 
document receives bandwidth proportional to the number of unsatisfied requests 
for it. Each time a document was completely broadcasted in this simulated first 
stage of the algorithm, a deadline is derived from the current time and the time 
it took to broadcast the document. In the second stage at any time the document 
with the earliest deadline is broadcasted. 

This algorithm is special, because it is so far the only algorithm with a proven 
worst case performance. In [19] it is shown for any £ > 0 to be (I-|-£)(4-|-£)-speed 
0(I)-competitive, if clients cannot buffer and where the constant 0{\) depends 
on £. In other words if we e.g. set £ = 0.1, the ART of a schedule computed 
by BEQUI-EDF for one channel is at most 0(1) times the ART of an optimal 
schedule which only has access to a channel with a bandwidth of w 0.22. Note 
that it is impossible to obtain a 1-speed competitive algorithm with ratio better 
than o(n^/^), see Section 1. 

Unfortunately the competitive analysis of BEQUI-EDF does not carry over 
to our model, where client buffering is enabled. We nevertheless think it is of 
interest to see how well an algorithm with provable worst case behavior (although 
for a slightly different model) performs compared to commonly used heuristics. 



Push-Based Most results in the literature are concerned with the unit-length 
case, although there is some work on arbitrary-length messages, without preemp- 
tion, see Section 1. The algorithm proposed in [36] (which is a 2-approximation 
with respect to the ERT) is the only candidate to date for the push-based setting 
with arbitrary-length messages and preemption. 

• PUSH: The algorithm expects the probabilities as input, we simply estimate 
these from the given requests Rj, i.e. we set -Ki = ^\{j \ j S {1 . . ,n},Dj = 
i}[. For randomly generated instances one has direct access to the underlying 
probabilities, see also next section. 

The documents are split into pages of length 1 (this is possible because S IN 
for z e {1 . . . m}) and in each time-step a document is selected and its next page 
in cyclic order is broadcasted. 

Schabanel [36] considers the case where broadcast costs Ci may be present. 
If these are set to zero, the analysis can be considerably simplified and yields 
that the pages of document i G {1 . . .m} should be broadcasted such that the 
expected distance between two consecutive broadcasts of z is := n/^/TTili, 
where /z = is a normalizing factor. These expected distances can 

be achieved by a simple randomized algorithm: at each time-step choose to 
broadcast document z € {1 . . .m} with probability l/r^. Note that = 

1. We implemented the derandomized greedy algorithm also given in [36]. 



3.3 The Experiments 

Generation of Instances To be able to sample instances for broadcast sched- 
uling problems, Sketch-it!’s generator needed to be enhanced in a straightforward 
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way. We chose to generate the instances such that the interarrival times of re- 
quests are exponentially distributed with rate A. This is a natural assumption 
which is often made in such a context, e.g. in queuing theory. For each request 
j G { 1 . . . n} we choose Dj = i G {1 .. . m} at random with probability tt^ . To 
assess realistic values for the probabilities tt^, we oriented ourselves at document 
popularities in the Internet. These are widely believed to behave according to 
Zipf’s Law, thus we choose 

TTi (X i~^, 

assuming that index i corresponds directly to the rank of a document, i.e. i = 1 
is the most popular document, i = 2 the second most, and so on. In [10] it is 
stated that an exponent of —1 models reality well. PUSH can directly be given 
these probabilities as input. Experiments showed that it makes no perceivable 
difference for our setup, whether PUSH obtains the tt^ or their estimates as 
described in the previous section (the following discussion is based on the case 
where the tt^ are estimated). 

It remains to choose a realistic distribution for the document sizes. [17] con- 
tains a comprehensive study on file sizes in the Internet. It yields that it is 
reasonable to assume file sizes to be Pareto distributed: 

Pr(“file size” > x) (x 

with X > k and a G (0,2], k > 0, whereby it is stated in [17] that a = 1.2 is 
realistic. As to not get arbitrarily large documents we chose the bounded Pareto 
distribution (see e.g. [16]) for the interval [1,100]. Furthermore we rounded the 
sizes to integer values. 

Conducted Experiments Of many interesting questions, we selected two: 

1. “How do the scheduling algorithms perform depending on the number of 
documents m?” To this end we ran tests, stepwise increasing the number 
of documents, starting from m = 2 up to m = 150. For each fixed m we 

generated 100 instances with 1000-4000 requests each, as described in the 

A RT" 

previous section. From this we calculate the mean and variance of 
IVTRT 

respectively of all 100 instances for each scheduling algorithm. The 

empirical variance shows how predictable a scheduling algorithm is. It often 
is better to choose a predictable algorithm (i.e. with low variance), even if 
on average it performs slightly worse than others. We set the request arrival 
rate to A = 2, that is in expectation 2 requests arrive per unit of time. 

2. “How do the algorithms perform depending on how heavily the system is 
loaded?” To simulate behavior with different loads, we set A = 2^ for 6 G 
{—6 . . .4}, instead of varying m. Everything else is done as above, m is set 
to 20. 

Discussion of the Results We now discuss the plots in Figure 1 (appendix) 
row by row. 
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1. ART: Most strikingly, SSTF behaves very badly, PUSH behaves about the 
same as the others and BEQUI-EDF is second best. A closer examination of the 
individual algorithms follows. 

• Pull-Based: SSTF shows the worst behavior of all algorithms. For all m it 
not only has the highest ART, but also a very high variance, i.e. the quality of 
a solution is very unpredictable and depends heavily on certain properties of the 
current input. 

This outcome is quite surprising, because in [4] SSTF’s overall empirical 
performance concerning ART is very good compared to LWF, LTSF and other 
algorithms. 

A possible explanation for the observed behavior is that in some instances 
requests for short documents appear often enough to starve a significant num- 
ber of requests for longer ones. In other words, constantly arriving requests for 
some short document i\ might each time preempt the broadcast of a longer 
document *2 such that more and more waiting requests for *2 accumulate. This 
suggests that SSTF’s performance greatly depends on how long popular docu- 
ments are compared to slightly less popular ones. Note that SSTF is the only 
pull-based algorithm which in no way takes into account the number of requests 
waiting for the individual documents. This presumably explains its exceptional 
status in all plots. 

LWF performs worst with the exception of SSTF. In [4] it has the overall worst 
performance among the tested algorithms with respect to the ART. LTSF em- 
pirically has the lowest ART, this confirms [4], where it also performs very well. 
For BEQUI-EDF the results look very promising. This is somewhat astonishing 
because the algorithm first simulates Equi-partition and then inserts deadlines 
at seemingly late points in time. A possible reason why it nevertheless performs 
well is given below. 

• Push-Based: Also PUSH does comparatively well, which again is somewhat 
surprising: one could have expected a bigger advantage of the pull-based algo- 
rithms, especially when the number of documents in the system increases. This 
result is very interesting and shows that PUSH also empirically performs well 
on instances with exponentially distributed interarrival times. In particular it is 
very robust against variation of file sizes. Note: the sizes are varied independently 
of document popularities. 

2. MRT: We do not go into details, just note that the good performance of LWF 
seems somehow intuitive: if requests with the longest waiting time are greedily 
selected, the probability that some request is starved for a long period of time is 
very small. On the other hand the MRT of PUSH continuously increases. This 
might stem from the fact that with increasing number of documents the smallest 
document probability decreases. If such a document is requested, it might 
take a long time until it is completely broadcasted by PUSH. 

3. ART: For the mentioned reasons SSFT is again by far worst for A > 1. 
PUSH and BEQUI-EDF both are comparatively bad for low loads, but improve 
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with higher loads. For A = 8, 16 they even show the best performance of all 
algorithms. 

It seems that the ART of PUSH stays at about the same level, independently 
of A. This would not be surprising: if all and k are fixed, PUSH computes 
a schedule which is independent of the choice of A. Thus the expected response 
time is also constant (this time is obviously independent of A for a fixed schedule) . 
So PUSH simply benefits from a heavily loaded system because the pull-based 
algorithms cannot exploit their advantage of knowing which documents are cur- 
rently actually requested. This they can do for lower loads, where PUSH also 
performs comparatively worse. 

The ART of BEQUI-EDF is presumably so high in systems with low load, 
because for each request it has to wait quite long until it can set a corresponding 
deadline for EDF. The simulated Equi-partition algorithm divides the bandwidth 
according to the number of outstanding requests for the individual documents. 
Thus in systems with high loads this might implicitly give estimations of the 
document probabilities tt^ for a certain time window. This could perhaps explain 
why it behaves so similarly to PUSH. 

4. MRT: Except for SSTF, BEQUI-EDF performs worst for high A. This is not 
astonishing because the algorithm is not at all trimmed to minimize the MRT. 
The MRT of PUSH again stays at about the same value, for the same reason as 
the ART does. 

4 Conclusions 

Sketch-it proved itself very useful while conducting the experiments. It was quite 
easy to implement the algorithms and also to enhance the generator and create 
the test suites. 

From the experiments we conclude that the pull-based algorithm SSTF does 
not carry over well from traditional (unicast) scheduling. Furthermore the push- 
based algorithm PUSH is very robust for the case of exponentially distributed 
interarrival times. It performs well compared to the pull-based algorithms, inde- 
pendently of the number of documents and the distribution of file sizes among 
differently popular documents. On highly loaded systems it even outperforms 
pull-based algorithms, because they cannot exploit their advantage of know- 
ing which documents the individual requests are actually for. These results are 
quite promising for Microsoft’s SPOT technology, if one assumes that primarily 
information like weather and news is broadcasted (i.e. a reasonable number of 
documents, and the Poisson assumption is close to reality) . In particular they are 
promising because it is probable that the system load is high: potentially a lot 
of users are listening to an extremely low-bandwidth channel. Moreover PULL’s 
empirically observed independence of the load (which confirms theoretical re- 
sults) makes this setting nicely scalable. On the other hand when the number of 
documents increases (e.g. if the possibility to send personal messages is added 
to the system) the ART is still reasonable (in the experiments about twice the 
amount of the best algorithm’s output), but the MRT increases distinctly. This 
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could mean that some users have to wait long for messages of low popularity 
(e.g. personal messages) and might get frustrated. 

An interesting open question would be to find out how the algorithms perform 
on real world data (e.g. traces of Web-Servers) or generated data where the 
requests do not stem from a Poisson process. 
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Fig. 1. Generated data. Plots of the ART, the variance of the ART, the MRT, 
and the variance of the MRT in dependency of the number of documents m 
respectively rate A. Each quantity is normalized by the corresponding lower 
bound, see text for details. For each data point 100 input traces were generated, 
each containing 1000-4000 requests. For the top 4 plots we chose A = 2, i.e. in 
expectation 2 request arrive in one unit of time. For the lower 4 plots we chose 
m = 20 
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Abstract. The uncapacitated facility location problem (UFLP) is a 
problem that has been studied intensively in operational research. Re- 
cently a variety of new deterministic and heuristic approximation algo- 
rithms have evolved. In this paper, we consider hve of these approaches 
- the JMS- and the MYZ- approximation algorithms, a version of Local 
Search, a Tabu Search algorithm as well as a version of the Volume al- 
gorithm with randomized rounding. We compare solution quality and 
execution times on different standard benchmark instances. With these 
instances and additional material a web page was set up [26], where the 
material used in this study is accessible. 



1 Introduction 

The problem of locating facilities and connecting clients at minimum cost has 
been studied widely in Operations Research. In this paper we focus on the un- 
capacitated facility location problem (UFLP). We are given n possible facility 
locations and m cities. Let F denote the set of facilities and C the set of cities. 
Furthermore there are non-negative opening costs fi for each facility i G F and 
connection costs Ctj for each connection between a facility i and a city j. The 
problem is to open a collection of facilities and connect each city to exactly one 
facility at minimum cost. 

Instead of solving this problem to optimality, we will focus on finding approx- 
imate solutions. In the following we will present five methods, which are origi- 
nating in different areas of optimization research. We will compare two approx- 
imation algorithms, two heuristics based on local search and one on LP-based 
approximation and rounding, which were recently developed and found to work 
good in practice. 

1.1 Approximation Algorithms 

Recently some new approximation algorithms have evolved for the metric version 
of the UFLP in which the connection cost function c satisfies the triangular 
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inequality. A couple of different techniques were used in these algorithms like 
LP-rounding ([11], [24]), greedy augmentation ([10]) or primal-dual methods 
([20], [10]). In terms of computational hardness Guha and Khuller [13] showed 
that it is impossible to achieve an approximation guarantee of 1.463 unless NP G 
DT I M For our comparison we chose two of the newest and most 
promising algorithms. 



JMS- Algorithm The JMS-Algorithm uses a greedy method to improve the 
solution. The notion of time that is involved was introduced in an earlier 3- 
approximation algorithm by Jain and Vazirani [20]. Later on Mahdian et al. [21] 
translated the primal-dual scheme into a greedy 1.861-approximation algorithm. 
In the third paper Jain, Mahdian and Saberi [19] presented the JMS-Algorithm 
(JMS), which improved the approximation bound to 1.61. However, it had a 
slightly worse complexity of 0{n^) instead of 0{n^ log n). The following sketch 
of JMS is taken from [22] : 

1. At first all cities are unconnected, all facilities unopened, and the budget 
of every city j, denoted by Bj, is initialized to 0. At every moment, each 
city j offers some money from its budget to each unopened facility i. The 
amount of this offer is equal to max{Bj — Cy,0) if j is unconnected, and 
max^Ci'j — Cij,0) if it is connected to some other facility i'. 

2. While there is an unconnected city, increase the budget of each unconnected 
city at the same rate, until one of the following events occurs: 

(a) For some unopened facility i, the total offer that it receives from cities 
is equal to the cost of opening i. In this case, we open facility i, and for 
every city j (connected or unconnected) which has a non-zero offer to i, 
we connect j to i. 

(b) For some unconnected city j, and some facility i that is already open, the 
budget of j is equal to the connection cost c^. In this case, we connect j 
to i. 

One important property of the solution of this algorithm is that it cannot 
be improved by simply opening an unopened facility. This is the main advan- 
tage over the previous 1.861-algorithm in [21]. In [19] experiments revealed an 
appealing behavior of JMS in practice. 



MYZ Algorithm The MYZ algorithm could further improve the approxima- 
tion factor of JMS. Mahdian, Ye and Zhang [22] applied scaling and greedy 
augmentation to the algorithm. For the resulting MYZ Algorithm (MYZ) the 
authors could prove an approximation factor of 1.52 for the metric UFLP, which 
is at present the best known factor for this problem for any algorithm. MYZ is 
outlined below. In step 4 of the algorithm C is the total connection cost of the 
present solution and C' the connection cost after opening a facility u. 
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1. Scale up all opening costs by a factor of (5 = 1.504 

2. Solve the scaled instance with JMS 

3. Scale down all opening costs by the same factor 6 

4. while there is a unopened facility u, for which the ratio {C — C — fu ) / fu is 
maximized and positive, open facility u and update solution 

1.2 Heuristic and Randomized Algorithms 

In terms of meta-heuristics there has not been such an intense research activity. 
A simulated annealing algorithm [3] was developed, which produces good results 
to the expense of high computation costs. Tabu search algorithms have been 
very successful in solving the UFLP (see [2], [23], [25]). A very elaborate genetic 
algorithm has been proposed by Kratica et al. over a series of papers ([16], 
[17], [18]). Their final version involves clever implementation techniques and 
finds optimal solutions for all the examined benchmarks. 



Tabu Search In [23] Van Hentenryck and Michel proposed a simple Tabu 
Search algorithm that works very fast and outperforms the genetic algorithm 
in [18] in terms of solution quality, robustness and execution time. Therefore we 
used this algorithm for the experiments. It uses a slightly different representation 
of the problem. For a solution of the UFLP it is sufficient to know the set S C F 
of opened facilities. Cities are connected to the cheapest opened facility, i.e. city j 
is connected to i G S' with = mini'g 5 (ci'j). A neighborhood move from S to S' 
is defined as flipping the status of a facility from opened to closed (S' = S\i) or 
vice versa (S' = S U f). When the status of a facility was flipped, flipping back 
this facility becomes prohibited (tabu-active) for a number of iterations. The 
number of iterations is adjusted using a standard scheme (see [23] for details). 
The high level algorithm can be stated as follows: 

1. S <— an arbitrary feasible solution 

2. Set cost(S*) = oo 

3. do 

4. bestgain = maximum cost savings over all possible non-tabu flips 

5. if {bestgain > 0) 

6. Apply random flip with best gain, update tabu lists and list length 

7. else close random facility 

8. Update S - connections of cities and datastructures 

9. if (cost(S') < cost(S'*)) do 5* ^ S' 

10. while change of S* in the last 500 iterations 

11. return S* 

For every city j the algorithm uses three pieces of information: The number 
of the opened facility with the cheapest connection to j, the cost of this connec- 
tion and the cost of the second cheapest connection to an opened facility. With 
this information the gains of opening and closing a facility can be updated incre- 
mentally in step 8. Thereby a direct evaluation of the objective function can be 
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avoided. The algorithm uses priority queues to determine the second cheapest 
connections for each city. Due to these techniques the algorithm has an execution 
time of 0{m log n) in each iteration. 



Local Search The Local Search community has only paid limited attention 
to the UFLP so far. Apart from the Tabu Search algorithms there have been 
a few simple local search procedures proposed in [15], [10]. In this paper we 
use the simple version of Arya et al [4], for which the authors could prove an 
approximation factor of 3 on metric instances. The algorithm works with the 
set S of opened facilities as a solution. An operation op is defined as opening 
or closing a facility or exchanging the status of an opened and a closed facility. 
To improve the execution time of the algorithm we incorporated the use of 
incremental datastructures from the Tabu search algorithm and preferences for 
the simple moves as follows. We generally prefer applying the simple flips of 
opening and closing a facility (denoted as ops). As in the Tabu Search we apply 
one random flip of the flips resulting in best gain of the cost function. When 
these flips do not satisfy the acceptance condition, we pick the first exchange 
move found that would give enough improvement. If there is no such move left, 
the algorithm stops. This modified version can be stated as follows: 

1. S' <— an arbitrary feasible solution 

2. exitloop <— false 

3. while exitloop = false 

4. while there is an ops such that cost(ops(S)) < (1 — p(^nm ) ) cost(S) 

5. find a random ops* of the ops with best gain 

6. do S ^ ops*(S) 

7. if there is an op such that cost(op(S)) < (1 — cost(S) 

8. do S <— op(S) 

9. else exitloop ^ true 
10. return S 

In our experiments the parameters were set to e = 0.1 and p{n, m) = n + m. 
Arya et al. suggested that the algorithm should be combined with the standard 
scaling techniques [10] to improve the approximation factor to 2.414. Interest- 
ingly this version performs inferior in practice. Therefore the version without 
scaling (denoted as LOCAL) was used for the comparison with the other algo- 
rithms. More on the unsealed and scaled versions of Local Search can be found 
in section 2.5 and [14]. 



Volume Algorithm For some of the test instances we obtained a lower bound 
using a version of the Volume algorithm, which was developed by Barahona 
in [6]. The Volume algorithm is an iterated subgradient optimization procedure, 
which is able to provide a primal solution and a lower bound on the optimal 
solution cost. To improve solution quality and speed up the computation Bara- 
hona and Chudak [7] used the rounding heuristic (RRWC) presented in [11] to 
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find good upper bounds on the optimal dual solution cost and thereby reduced 
the iterations of the Volume algorithm. However, this approach has generally 
very high execution times in comparison to the other methods presented here. 
Instead we used a faster version of this algorithm which involves only a basic 
randomized rounding procedure and slightly different parameter settings. It will 
be denoted by V&RR and is available on the web page of the COIN-OR project 
by IBM [5]. Regarding solution quality and execution time this algorithm is 
generally inferior to the other algorithms. The results should only be seen as 
benchmark values of available optimization code. We will not go into detail de- 
scribing the method, the code or the parameter settings here. The interested 
reader is referred to [5], [6], [7] for the specific details of the algorithm and the 
implementation. 

2 Experiments 

We tested all given algorithms on several sets of benchmark instances. First we 
studied the Bilde-Krarup benchmarks, which were proposed in [9]. These are 
non- metric small scale instances with n x m = 30 x 80 - 50 x 100. We chose 
this set because it is randomly generated, non-metric and involves the notion of 
increasing opening costs also present in the large scale fc-median instances. Next 
we focused on small scale benchmarks proposed by Galvao and Raggi in [12]. 
These are metric instances with n = m = 50 - 200, which we chose because they 
make use of the shortest path metric and a Normal distribution to generate costs. 
Then we examined the performance on the cap instances from the ORLIB [8] 
and the M* instances, which were proposed in [18]. These are non-metric small, 
medium and large scale instances with nxm = 16x50- 2000 x 2000. They have 
previously been used to examine the performance of many heuristic algorithms. 
Finally we studied metric large scale instances with n = m = 1000 - 3000, which 
were proposed in [1] and used as UFLP benchmarks for testing the performance 
of the Volume algorithm in [7]. So our collection of benchmark instances covers 
a variety of different properties: small, medium and large size; Euklidian metric, 
shortest path metric and non-metric costs; randomly generated costs from uni- 
form distributions and Normal distributions. 

On all instances we averaged over the performance of 20 runs for each algo- 
rithm. The experiments were done on a 866Mhz Intel Pentium III running Linux. 
For most problems we used CPLEX to solve the problems to optimality. The 
CPLEX-runs were done on a 333Mhz Sun Enterprise 10000 with UltraSPARC 
processors running UNIX. The execution times are about a factor of 2.5 times 
higher than the times for the algorithms. Here we only report average results for 
the different benchmark types. For more detailed results of our experiments and 
values for the single instances the reader is referred to [14]. 

With all benchmark instances, implementations of all algorithms and benchmark 
generators a web page was set up. All material used in this study can be accessed 
online at the UflLib [26]. 
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Table 1. Parameters for the Bilde-Krarup problem classes 



Type 


Size (n X m) 


h 


Cij 


B 


50 X 100 


Discrete Uniform (1000, 10000) 


Discrete Uniform (0,1000) 


C 


50 X 100 


Discrete Uniform (1000, 2000) 


Discrete Uniform (0,1000) 


Dq* 


30 X 80 


Identical, 1000*q 


Discrete Uniform (0,1000) 


Eq* 


50 X 100 


Identical, 1000*q 


Discrete Uniform (0,1000) 


* q=l 


,...,10 







2.1 Bilde-Krarup Instances 

The Bilde-Krarup instances are small scale instances of 22 different types. The 
costs for the different types are calculated with the parameters given in Table 1. 
As the exact instances are not known, we generated 10 test instances for each 
problem type. In Table 3 we report the results of the runs for each algorithm. 
In columns ’Opt’ we report the percentage of runs that ended with an optimal 
solution. In columns ’Error’ we report the average error of the final solution in 
percentage of the optimal solution, in columns ’Time’ the average execution time 
in seconds. In column ’CPX’ we denoted the average execution time of CPLEX 
to solve the instances. 

The deterministic algorithms perform quite good on these instances. The average 
error is 2.607% at maximum although the problems are not of metric nature. 
MYZ performs significantly better than JMS in terms of solution quality. It can 
solve additional 37 problems to optimality and has a lower average error. The 
execution time is slightly higher because it uses JMS as a subroutine. 

For the heuristic algorithms TABU provides the best results. It was able to solve 
problems of all classes to optimality in a high number of runs. Unfortunately 
it also is much slower than LOCAL, MYZ and JMS. LOCAL also performs 
competitive on most of these problem classes. Compared to TABU it is able to 
solve problems of all classes to optimality, but the overall number of instances 
solved is very much lower. In terms of the execution time it is much faster though. 
V&RR is outperformed by any of the other algorithms. It reveals the highest 
execution time and the worst solution quality. 

2.2 Galvao-Raggi Instances 

Galvao and Raggi proposed unique benchmarks for the UFLP. A graph is given 
with an arc density S, which is defined as J = connections present /(m * n). 
Each present connection has a cost sampled from a uniform distribution in the 
range [l,n] (except for n = 150, where the range is [1,500]). The connection 
costs between a facility i and a city j are determined by the shortest path 
from i to j in the given graph. The opening costs fi are assumed to come from 
a Normal distribution. Originally Galvao and Raggi proposed problems with 
n = m = 10, 20, 30, 50, 70, 100, 150 and 200. We will consider the 5 largest types. 
The density values and the parameters for the Normal distribution are listed in 
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Table 2. Parameters for the Galvao-Raggi problem classes 



Size 


<5 


Parameters for fi 


mean 


stand, dev. 


50 


0.061 


25.1 


14.1 


70 


0.043 


42.3 


20.7 


100 


0.025 


51.7 


28.9 


150 


0.018 


186.1 


101.5 


200 


0.015 


149.5 


94.4 



Table 2. The exact instances for these benchmarks are not known. As for the 
Bilde-Krarup benchmarks we generated 10 instances for each class. The results 
of our experiments are reported in Table 3. Columns ’Opt’, ’Error’ and ’Time’ 
are defined as before. We also included the average execution times of CPLEX 
in column ’CPX’. 

JMS performs slightly better than MYZ on these metric instances. Of the heuris- 
tic and randomized algorithms V&RR performs very good - even better than 
TABU and LOCAL - to the expense of high execution times. In fact, the times 
are prohibitively high as the algorithm needs much more time than CPLEX 
to solve the instances to optimality. LOCAL performs a little bit better than 
TABU, because both the execution times and the average errors are smaller. 
However, it is not very reliable to find optimal solutions. 

2.3 ORLIB and M* Instances 

The cap problems from the ORLIB are non-metric medium sized instances. The 
M* instances were designed to represent classes of real UFLPs. They are very 
challenging for mathematical programming methods because they have a large 
number of suboptimal solutions. In Table 4 we report the results for the different 
algorithms. In columns ’Opt’ we again denote the percentage of runs that ended 
with an optimal solution. In Columns ’Error’ we report the average error of the 
final solution in percentage of the optimal solution. For the larger benchmarks 
the optimal solutions are not known. Instead we used the best solutions found 
as a reference, which for all benchmarks were encountered by TABU. All values 
that do not relate to an optimal solution are denoted in brackets. In columns 
’Time’ we report average execution times in seconds. Furthermore in column 
’CPX’ we report the average execution time of CPLEX. 

Again the deterministic algorithms perform very well. The maximum error for 
both methods was produced on the capa benchmark. Of the deterministic al- 
gorithms MYZ did perform better than JMS. It was able to solve additional 
6 problems to optimality. JMS could only achieve a better performance in 4 
of the 37 benchmarks. In terms of execution time MYZ becomes slightly less 
competitive on larger problems because the additional calculations of the greedy 
augmentation procedure need more time. 
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With a maximum average error of 0.289% TABU again is the algorithm with 
the best performance on these benchmarks. It is able to solve all problems to 
optimality - in most cases with a high frequency. Hence, our results are consis- 
tent with the values reported in [23]. However, the execution times of our code 
are significantly faster than the times needed by the implementation of Michel 
and Van Hentenryck on a similar computer (a factor of 2 and more) . Compared 
to TABU the solution quality of LOCAL is not very competitive. It fails to 
find optimal solutions on 9 problems, while 7 of them are cap-benchmarks. The 
execution times, however, are very competitive, as it performs in most cases sig- 
nificantly better than TABU. 

V&RR performs generally worse than the other algorithms. On some of the cap 
instances it achieves good solution quality. On the M* instances, however, it per- 
forms worse than all other algorithms in terms of solution quality and execution 
time. The execution times for the small problems exceed the times of CPLEX 
again. The practical use of this algorithm for small problems should therefore 
be avoided. For problems with m,n> 100, however, execution times of CPLEX 
become significantly higher. 

2.4 fc-Median Instances 

In this section we take a look at large scale instances for the UFLP. The bench- 
marks considered here were originally introduced for the fc-median problem in [1]. 
In [7] they were used as test instances for the UFLP. To construct an instance, 
we pick n points independent uniformly at random in the unit square. Each 
point is simultaneously city and facility. The connection costs are the Euklidian 
distances in the plane. All facility opening costs are identical. To prevent nu- 
merical problems and preserve the metric properties, we rounded up all data to 
4 significant digits and then made all the data entries integer. For each set of 
points, we generated 3 instances. We set all opening costs to -^/n/lO, ^/n/lQO 
and -v/n/lGOO. Each opening cost defines a different instance with different prop- 
erties. In [1] the authors showed that, when n is large, any enumerative method 
based on the lower bound of the relaxed LP would need to explore an exponen- 
tial number of solutions. They also showed that the solution of the relaxed LP 
is, asymptotically in the number of points, about 0.998% of the optimum. 

In Table 4 we report the results of our experiments for n = 1000, 2000, 3000. In 
column ’LB’ we provide the lower bound on each problem calculated by V&RR. 
For each algorithm we report the average error and the average execution time. 
All errors were calculated using the lower bound in ’LB’. 

On these metric benchmarks JMS again performs slightly better than MYZ. 
TABU is the best algorithm in terms of solution quality. LOCAL manages to 
find better solutions than the deterministic algorithms, but it is much slower 
than TABU, JMS and MYZ. The performance of V&RR is not competitive in 
comparison to the other algorithms. It is outperformed in terms of solution qual- 
ity and execution time by all algorithms on nearly all benchmarks. Only on the 
larger benchmarks with small opening costs the execution times of LOCAL are 
equally slow. One reason for this is the use of priority queues. For the problems 
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with smaller opening cost optimal solutions open a high number of facilities. 
Here the operations on the queues are getting expensive. When implemented 
without queues the adjustment of the datastructures when opening a facility 
(which is the operation used more often here) could be executed in 0(m). The 
closing operation would need 0{nm), which leads to inferior execution times on 
average. However, in this case the closing operation is most often used in the 
exchange step, which is called after nearly all facilities have been opened. Then 
most of the cities are connected to the facility located at the same site, and clos- 
ing a facility affects only one city. So finding the new closest and second closest 
facilities can be done in 0{m). Thus, it is not surprising that an implementation 
without queues was able to improve the execution times on the large problems 
with n = m > 1000 by factors of up to 3. Nevertheless we chose to implement 
priority queues in our version of LOCAL as their theoretical advantage leads to 
shorter execution times on average. 

2.5 Scaling and Local Search 

In [10] a scaling technique was proposed to improve the approximation bound 
of local search for the metric UFLP. In the beginning all costs are scaled up 
by a factor of Then the search is executed on the scaled instance. Of all 
candidates found the algorithm exits with the one having the smallest cost for 
the unsealed instance. With this technique the search is advised to open the most 
economical facilities. However, the solution space of the scaled instance might 
not be similar to the space of the unsealed instance. Therefore in practice it is 
likely that the scaled version ends with inferior solutions. It becomes obvious that 
this adjustment is just for lowering theoretical bounds and has limited practical 
use. The scaling technique was proposed for Local Search on the metric UFLP. 
However, it deteriorates the performance of Local Search on metric as well as 
non-metric instances. Please see [14] for experimental results. 

3 Conclusions 

The uncapacitated facility location problem was solved by 5 different algorithms 
from different areas of optimization research. The deterministic algorithms man- 
age to find good solutions on the benchmarks in short execution times. Generally 
MYZ can improve the performance of JMS to the expense of little extra execu- 
tion time. On the tested metric instances the performance of the algorithms is 
competitive to the heuristic and randomized algorithms tested while the execu- 
tion times remain significantly shorter. Here JMS offers slightly better solutions 
than MYZ. The approximation algorithms reveal higher errors only on a few 
tested non-metric instances, but always deliver solutions that are within 5% of 
optimum. The presented Local Search profits from the intelligent use of datas- 
tructures. On a number of instances the execution times are able to compete 
with those of MYZ and JMS, but due to changing starting points the algorithm 



174 



l,lr 



X 


JMS 


□ 


MYZ 


« 


LOCAL 


o 


V&RR 



X* • _ 



^ ^ _ * 0 - _ %- 0_ -03 - 



o °o 

0 -- 0 -- 



0,01 



0,1 



Execution time 



Fig. 1. Plot of solution costs and execution times in comparison to TABU 



is not very robust. Scaling techniques that lead to improved approximation fac- 
tors deteriorate the performance of the algorithm in practice. TABU is able to 
find optimal solutions in most cases. It is much faster than V&RR (and Local 
Search on special instances), but the execution times cannot compete with those 
of MYZ and JMS. The tested version of the Volume algorithm V&RR is not 
competitive regarding solution quality and execution times. 

The preference for a method in practice depends on the properties of the prob- 
lem instances and the setting. If speed is most important, JMS or MYZ should 
be used, especially if metric instances are to be solved. If solution quality is most 
important, TABU should be used. In a general setting the results indicate a pref- 
erence for TABU, as it achieves best solution quality in a reasonable amount of 
time. 

Finally we present a plot of the results in relation to TABU. The x- and y- 
coordinates represent values regarding execution time and solution cost, respec- 
tively. The coordinates were calculated by dividing the results of the algorithms 
by the results of TABU. We further adjusted some of the data by averaging 
over the D- and E-instances of Bilde-Krarup and the instances of the same size 
of fc-median, respectively. There are 22 dots for each algorithm. 

Only a few dots are located in the lower half of the plot, i.e. hardly any time 
TABU was outperformed in terms of solution cost. Moreover, there is hardly any 
dot in the lower left quadrangle, which indicates better performance in terms of 
execution time and solution cost. In the upper left quadrangle most of the dots 
of JMS and MYZ are located indicating faster performance with higher solution 
costs. In the upper right part most of the dots of V&RR are located. This means 
worse performance regarding execution time and solution cost. Most of the dots 
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of LOCAL are spread closely above the line in the upper half, which is due to 
slightly higher solution costs, the faster performance on smaller and the slower 
performance on larger instances. 
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Abstract. We consider the problem of enumerating, in order of in- 
creasing length, the K shortest paths between a given pair of nodes 
in a weighted digraph G with n nodes and m arcs. To solve this prob- 
lem, Eppstein’s algorithm first computes the shortest path tree and then 
builds a graph D(G) representing all possible deviations from the shortest 
path. Building D(G) takes O(m-l-nlogn) time in the basic version of the 
algorithm. Once it has been built, the K shortest paths can be obtained 
in order of increasing length in 0{K\ogK) time. However, experimen- 
tal results show that the time required to build D{G) is considerable, 
thereby reducing the practical interest of the algorithm. In this paper, 
we propose a modified version of Eppstein’s algorithm in which only the 
parts of D{G) which are necessary for the selection of the K shortest 
paths are built. This version maintains Eppstein’s worst-case running 
time and entails an important improvement in practical performance, 
according to experimental results that are also reported here. 



1 Introduction 

Enumerating, in order of increasing length, the K shortest paths between two 
given nodes, s and t, in a digraph G = (V, E) with n nodes and m arcs is a fun- 
damental problem that has many practical applications and has been extensively 
studied [1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. 

The algorithm with the lowest asymptotical worst-case time complexity that 
solves this problem is due to Eppstein [6]. After computing the shortest path 
from every node in the graph to t, Eppstein’s algorithm builds a graph D{G) 
which represents all possible one-arc deviations from the shortest path tree. 
Building D{G) takes 0{m nlogn) time in the basic version of the algorithm, 
and 0{m n) time in a more elaborate but “rather complicated” [6] version. 
The graph D{G) implicitly defines a path graph P{G) such that the K shortest 
paths from its initial node to any other node represent the K shortest s-t paths 
in G, and it takes 0{K log K) time to compute them once D{G) has been built. 

In [9], a different algorithm for computing the K shortest paths between 
two given nodes was proposed that under many circumstances runs significantly 

* This work has been supported by the Generalitat Valenciana under grant 
CTIDIA/2002/209 and by the Spanish Ministerio de Ciencia y Tecnologia and 
FEDER under grant TIC2002-02684. 
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faster in practice. This algorithm, known as the Recursive Enumeration Algo- 
rithm (REA), computes every new s-t path by recursively visiting at most the 
nodes in the previous s-t path and using a heap of candidate paths associated 
to each node from which the next path from s to the node is selected. Af- 
ter computing the shortest path from s to every node, the algorithm computes 
the K shortest paths in order of increasing length in 0{m-\- Knlog{m/n)) time. 
However, this is a worst-case bound which is only achievable in extremely rare 
situations. The REA can be considered as a lazy evaluation algorithm, in which 
the K shortest paths to intermediate nodes are only computed when they are 
required. In this way, a huge amount of computation effort can be saved. Exper- 
imental results, reported in [9], showed that, in practice, the REA outperforms 
Eppstein’s algorithm [6] for different kinds of randomly generated graphs. 

In this paper, we apply a similar idea to Eppstein’s algorithm. The exper- 
imental results in [9], reproduced in Sect. 5, show that the time required by 
Eppstein’s algorithm to build D{G) is considerable, and the reason for its prac- 
tical inefficiency. Once D{G) has been built, enumerating the K shortest paths is 
extremely fast. Here we propose a modified version in which a recursive function 
builds only the parts of D{G) that are necessary for the selection of the K short- 
est paths. In this way, the asymptotical worst-case complexity is maintained and 
a considerable reduction in computation effort is achieved in practice, according 
to experimental results with different kinds of randomly generated graphs. The 
new version is not only much faster than the original algorithm, but also faster 
than the REA in many cases. 

The rest of this paper is organised as follows. After introducing some basic 
definitions and notation in Sect. 2 and summarizing Eppstein’s algorithm in 
Sect. 3, the proposed modification to Eppstein’s algorithm is presented in Sect. 4. 
Experimental results comparing this new version with the original one and with 
the REA are reported in Sect. 5. Finally, Sect. 6 contains conclusions and final 
remarks. 



2 Problem Formulation and Notation 

Let G = (V,E) be a directed graph, where V is the set of nodes and E CV xV 
is the set of arcs. Given e = (u,v) G E we call tail{e) to u and head{e) to v. Let 
£ : A — > R be a function mapping arcs to real- valued lengths. 

Given two nodes u and v in V , a path from utov (or u-v path) is a sequence 
of arcs p = Pi ■ p 2 ■ ■ ■ ■ ■ p\p\, where tail{pi) = u, head{p\p\) = v, and head{pi) = 
tail{pi+i) for 1 < i < \p\. The length of p is L{p) = X!i<j<|p| £{Pi)- In this paper 
we consider the problem of enumerating, in order of increasing length, the K 
paths from a starting node s to a terminal node t with minimum total length. 
It will be assumed that G does not contain negative length cycles. 

For each v in V, let d{v, t) be the length of the shortest v-t path. Let T be the 
single-destination shortest path tree containing the shortest v-t path for each v 
in V. Let out{v) be the set of arcs with tail v that are not in the shortest path 
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from V to t, and let nextriv) be the node w following v in T, so that the shortest 
path from ?; to t is the arc (v, w) followed by the shortest path from w to t. 

For each (u,v) in E, let us define S{u,v) = £{u,v) + d{v,t) — d{u,t). The 
value 5{u,v) is the additional cost we pay if, instead of following the shortest 
path from u to t, we first take the arc (m, v) and then follow the shortest path 
from V to t. Trivially, S{u^v) > 0 for each (u,v) in E, and S{u^v) = 0 for each 
(m, v) in T. The length of a s-t path p can be obtained by adding the value of 
S for the arcs of p to d{s,t) and, because S{u,v) = 0 for the arcs (u,v) of T, 
we only need to add the value of 6 for the arcs of p that are not in T [6] . Let 
sidetracks (p) be the subsequence of arcs of p that are not in T, let 

L{sidetracks{p)) = ^ S{u,v), (1) 

{u,v)^ sidetracks (p) 



and let be the sidetracks of the fcth shortest path from s to t. Since L{p) = 
d{s, t) + L{sidetracks{p)) and any path p can be made explicit from sidetracks{p) 
and T in time proportional to \p\, the computation of the K shortest paths 
problem can be restated as the computation of T and . . . , [6]. This is 

the aim of Eppstein’s algorithm. 

3 Eppstein’s Algorithm 

In this section we summarize, for the sake of completeness and to make clear 
the modification proposed in Sect. 4, the basic version of Eppstein’s algorithm. 
We base our description on [6], where a more detailed explanation and a proof 
of correctness can be found. 

Once T has been computed, Eppstein’s algorithm builds, in time 0{m + 
nlogn), a graph D(G) whose nodes are arcs in if — T and which represents all 
possible s-t paths in G differing from T in only one arc, scored by 5. The graph 
D{G) implicitly defines the so-called path graph P{G) in which the weights 
of arcs are chosen so that finding the K best paths from the initial node to 
any destination in P{G), a problem that can be solved in time 0(K log K), is 
equivalent to finding the K best paths from s to t in G. A more detailed step 
by step description of this algorithm is provided below. 

Step 1 Compute T and 5{u,v) for all {u,v) G E. The time taken by this step 
depends on the kind of graph [4]. For acyclic graphs, T can be obtained in 0{m) 
time. If the graph contains cycles but the arc lengths are not negative, then 
Dijkstra’s algorithm combined with Fibonacci heaps performs this computation 
in 0(m -|- nlogn) time. In the general case it can be done by means of the 
Bellman-Ford algorithm in 0{mn) time. 

Step 2 For each w G E, in any order such that nextriv) is processed before v: 
Step 2.1 Build a heap Hout{v) whose elements are the arcs in out{v) heap- 
ordered by the value of S in such a way that the root of Hout(v), denoted 
outroot{y), has only one child (see Fig. 1). 
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V nextTiy) t 




Fig. 1. The heap Hout{v) contains all the arcs {v,w) in E except the first arc 
in the shortest path from v to t (represented by the horizontal arcs). The black 
circles represent nodes in this heap and correspond to arcs in G 
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Fig. 2. Hq(v) contains Hout{v) and the elements in Hcinextriv)). Hc{t) only 
contains Hout{t) 



Step 2.2 If r; = t, let Hc{t) be the heap whose only element is Hout{t); 
else, build a heap Hc{v) by inserting Hout(v) in Ho{nextT{v)) with score 
5{outroot{v)) in a persistent (non destructive) way (see Fig. 3). This insertion 
is guided by a balanced heap containing only outroot{w) for each w in the 
shortest v-t path. The root of Hq{v) will be denoted h{v). 

It is possible to build Ho„t(v) in time proportional to the cardinal of out{v) 
by first finding its root and then heapifying the rest of the elements in out(v). 
Thus, Step 2.1 takes 0{m) time. Every heap Hq{v) is built without modify- 
ing HG{nextT(v)) in time O(logn). Therefore, the time required to perform 
Step 2.2 for all nodes is 0(n log n), and the total time required by Step 2 is 
0{m + nlogn). 

For each v in V, the second best path from v to t differs from T in only one arc, 
which belongs to outiw) for some node w in the shortest path from v to t. This 
arc is h{v). In particular, h{s) is S'^. 

Step 3 The set of heaps Hg{v) for all G V forms a directed acyclic graph, 
D{G), with O(m-l-nlogn) nodes. Each node in D{G) corresponds to an arc of G 
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Hg('L’) Ha(v) Ha{nextT{y)) 





Fig. 3. (a) Heap Hc{v). (b) Real representation in memory of Ha{v)\ the 

nodes of Hcinextriv)) that should be modified by the insertion of Hout{v) are 
replicated and linked to the corresponding subtrees in Ho{nextT{v)) 



and each arc in G is represented by at least one node in D{G). For each (e, /) in 
which e is the parent of / in any heap Hc{v), there is an arc (e, /) in D{G). The 
graph D{G) defines a different graph P{G) which is the result of augmenting 
D{G) as follows: 

1. Associate a weight 6{f) — S{e) to each arc (e, /) in D{G). 

2. For each node e in D{G), add an arc (e, h{head{e))) with weight S{h{head{e))). 
These arcs are called cross-arcs. 

3. Add an initial node r and an arc from r to h{s) with weight S{h(s)). 

In this way, each path p in P(G) from r to any node is associated to the s-t path 
in G whose sidetracks are the tails of the cross-arcs in p followed by the last node 
in p. The problem of enumerating the K shortest s-t paths in G reduces to the 
problem of enumerating the K shortest paths from r to any node in P{G). This 
can be efficiently done with the help of a priority queue Q whose elements are 
sidetracks of s-t paths scored by the function L defined in (1). It is not necessary 
to explicitly build T’(G'); instead, once D{G) has been obtained, the sidetracks 
of the K shortest paths are obtained as follows: 

Step 3.1 Initialize Q to {h(s)}. 

Step 3.2 For k = 2 to K: 

Step 3.2.1 If Q is empty, stop (no more s-t paths exist); else, extract S^, 
the element in Q with the lowest score. Let e be the last sidetrack of S^. 

Step 3.2.2 Insert ■ f in Q with score -I- S{f), where / is the 

sidetrack h{head{e)). 

Step 3.2.3 For each sidetrack / such that (e, /) is an arc in the graph 
D{G), insert prefix(S^) ■ f in Q with score L{S'^) — (5(e) -I- S{f), where 
prefix {S^) is the sequence 5^ except its last sidetrack. 

Note that all the sidetracks in Q can be efficiently represented as a prefix tree 
and that they are incrementally scored. Since the out-degree of each node in 
D{G) is at most 3, Step 3 takes 0{K log K) time [G]. 
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1. Compute T and 6{u, v) for all (u, v) in E. 


1. Compute T and 6{u, v) for all (u, v) in E. 


2. For each vertex v \nV\ 

2.1 Build Hout{v). 

2.2 Insert Houti'fj) into Hc{nextT{'v)) to form 
Hg{v). 

As a result, D{G) is obtained. 


2. Call BuildHois). 


3.1 Initialize Q to {h(s)}. 


3.1 Initialize Q to {h(s)}. 


3.2 For fc — 2 to K\ 

3.2.1 If Q = 0, stop; else, extract , the element 

in Q with the lowest score. Let e be the 
last sidetrack of . 


3.2For A: — 2 to K: 

3.2.1 If Q = 0, stop; else, extract , the element 

in Q with the lowest score. Let e be the 
last sidetrack of ■ 




3.2.2Call BuildHQ{head{e)). 


3.2.2 Insert ■ f in Q with score L{S^) -\- 5{f), 

where / is the sidetrack h{head{e)). 

3.2.3 For each sidetrack / such that (e, /) is an 
arc in the graph D{G), insert prefix{S^) ■ f 
in Q with score L{S^) — S{e) + 5{f). 


3.2.3 Insert ■ f in Q with score L(S^) 5{f), 

where / is the sidetrack h{head{e)). 

3.2.4 For each sidetrack / such that (e, /) is an 
arc in the graph D{G), insert prefix{S^) ■ f 
in Q with score L{S^) — 5(e) + 5(/). 




Function BuildHciv): 

A. If Hq{v) has not been built before then 
A.l. Build Hout{v). 

A. 2. V — t then let Hq{v) — Hout{v), else call 

Build H q {next'!' {v)) and insert into 

HG{nextT{v)) to form Hg{v). 



Fig. 4. Eppstein’s algorithm (left) and the proposed lazy version (right), side 
by side 



4 Lazy Version of Eppstein’s Algorithm 

In Step 2, Eppstein’s algorithm builds D{G), that is, Hout{v) and Hc{v) for 
every node v in V. This step takes 0(m + n log n) time, which in many practical 
situations is a high cost, and is performed before computing in Step 3. In this 
section we propose a modification that reduces this cost by building only the 
heaps Hout{v) and Hc{v) that are really needed to compute S'^, . . . , . 

This can be done by means of a recursive function Build Hc{v) that detects 
whether Hc{v) has been already built. If Hc{v) exists, then the function does 
nothing; else, it proceeds to build Hq[v) by inserting Hout{v) into Hainextriv)) 
after recursively calling BuildHG(nextT{v)). The insertion procedure is per- 
formed in the same nondestructive way as Eppstein’s original algorithm in 
0(log n) time. The recursive calls go through the nodes of the shortest v-t path, 
and stop when they reach t or a node w whose Hg{vj) has already been built. 
In the base case v = t, Ha{t) only contains Hout{t)- 

In Step 3.1 of Eppstein’s algorithm, the priority queue Q is initialized with 
h{s), which is the root of Hg{s). Therefore, in the new version we build Hg{s) 
before Step 3.1 by calling BuildHcis). This call will recursively build Hg{w) for 
each node w in the shortest s-t path. 
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In Step 3.2.2 of Eppstein’s algorithm, h{head{e)) is used, where e is the last 
sidetrack of S'*’ and h{head{e)) is the root of Hc{head{e)). In the new version, 
calling Build He {head {e)) ensures that Hc{head{e)) is available. 

Thus, instead of completely building D{G) in the beginning, its construction 
is performed simultaneously with the generation of the K shortest paths, by 
carrying out a recursive traversal of paths similar to the one done by the REA. 
The function BuildHa{v) is called only when Hc{v) is required and, therefore, 
the computation of Hq{v) for many nodes v in V can be saved. This can be 
considered a lazy evaluation version of Eppstein’s algorithm. The worst-case 
asymptotical complexity is the same as Eppstein’s original algorithm because, 
as k increases, the lazy version will tend to compute Hout{v) and Hg{v) for 
every v in V, in total time 0{m + nlogn). However, in practice, the time saved 
can be considerable. The experiments presented in the next section show that 
the lazy version can be significantly faster, even when computing a huge number 
of shortest paths. 

A complete description of this lazy version of Eppstein’s algorithm is given 
in Fig. 4, side by side with the original algorithm so that the differences can be 
more easily appreciated. 

5 Experimental Comparison 

In this section, the experimental results are reported comparing, for three differ- 
ent kinds of random generated graphs, the lazy version of Eppstein’s algorithm 
presented in Sect. 4 (LVEA) with the basic version of Eppstein’s algorithm (EA) [6] 
and with the Recursive Enumeration Algorithm (REA) [9]. We reproduced the 
experimental conditions in [9] so that the results can be directly compared. For 
the sake of clarity, we used the basic version of the REA in which the sets of path 
candidates are implemented with heaps, although there are some cases in which 
a mixture of heaps and unsorted arrays may be faster [9]. 

All these programs were implemented in C and are publicly available at 
http : //terra, act .uj i . es/REA. Since the three algorithms share Step 1, we are 
interested in comparing their performance once T has been computed. Therefore, 
all time measurements start when Dijkstra’s algorithm ends. Each point in the 
curves shows the average execution time for 15 random graphs generated with 
the same parameters, but different random seeds. The programs were compiled 
with the GNU C compiler (version 2.7) using the maximum optimization level. 
The experiments were performed on a 300 MHz Pentium-H computer with 256 
megabytes of RAM, running under Linux 2.0. 

Results for Graphs Generated with Martins’ General Instances Gen- 
erator. First, we compared the algorithms using Martins’ general instances 
generator, the same that was used in the experiments reported in [9] and [10]. 
The input to this graph generator consists of four values: seed for the random 
number generator, number of nodes, number of arcs, and maximum arc length. 
The program creates a Hamiltonian cycle and then adds arcs at random. The 
arc lengths are uniformly distributed between 0 and 10^. 
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(a) n = lo'®, m = 10®, d = 10®. (b) n = 10®, m = 10®, d = 10®. 





(c) ra = 10®, m = 10®, d = 10®. (d) ra = 5 • 10® , m = 2 • 10^, d = 4. 

Fig. 5. Results for graphs generated with Martins’ generator. CPU time as 
a function of the number of computed paths, (a) is an enlargement of the initial 
region of (b) to appreciate the differences for small values oi K. {d = m/n is the 
average input degree.) 



Figure 5 represents the CPU time required to compute up to 10® paths by 
each of the algorithms for graphs with different average degrees: high (Fig. 5b), 
medium (Fig. 5c), and low (Fig. 5d). Figure 5a is an enlargement of Fig. 5b 
for small values of K. With regard to the behaviour of EA, it can be observed 
that, once D{G) has been built, the K shortest paths are found very efficiently. 
However, building this graph requires (in comparison with the other algorithms) 
a considerable amount of time that can be clearly identified in the figures at the 
starting point K = 2. These figures also illustrate that, in contrast, the LVEA 
does not invest time in the construction of D(G) in the beginning, but instead 
builds it progressively. For large values of K, the full graph D{G) is eventually 
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(a) N = 1000, 7] = 100, d = 10. 



(b) N = 1000, 7) = 100, d = 10. 




(c) N = 100, 7) = 100, d = 10. (d) N = 100, 7) = 100, d = 100. 



Fig. 6. Experimental results for multistage graphs. CPU time as a function of 
the number of computed paths, (a) is an enlargement of the initial region of 
(b). (Parameters: N, number of stages; r/, number of nodes per stage; d, input 
degree.) 



built and the running time of the LVEA converges to the running time of EA. The 
REA is still faster for these graphs. 

Results for Multistage Graphs. Multistage graphs are of interest in many 
applications and underlie many discrete Dynamic Programming problems. A mul- 
tistage graph is a graph whose set of nodes can be partitioned into N disjoint 
sets (stages), Ui, V 2 , ■ ■ ■ , Uv, such that every arc in E joins a node in Vi with 
a node in V)+i, for some i such that 1 < i < A^. We used a program that gen- 
erates random multistage graphs given the number of stages, number of nodes 
per stage, and input degree (which is fixed to the same value for all the nodes). 
Arc lengths are again uniformly distributed between 0 and 10^. 
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(a) 77 = 100, d=w,K = 1000. 



(b) rj = 100, d=W,K = 10000. 
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(c) N = 100, n = 100, K = 1000. (d) N = 100, 7? = 100, K = 10000. 



Fig. 7. Experimental results for multistage graphs. CPU time as a function of 
the number of stages (a and b) and the input degree (c and d). (Parameters: N, 
number of stages; t], number of nodes per stage; d, input degree; K, number of 
computed paths.) 



The results are illustrated in Figs. 6 and 7. Figure 6a is an enlargement of 
Fig. 6b for small values of K. We can also observe here that, once D{G) has been 
built, the K shortest paths are found by EA very efficiently, but building D{G) 
is a highly costly operation and the main reason for the relative inefficiency of 
EA. The time saved by the LVEA is considerable, in this case even for large values 
of K. The REA does not perform as well in these graphs due to the fact that 
it computes the fcth shortest path visiting, in the worst case, all the nodes in 
the {k — l)th shortest path, and in multistage graphs the number of nodes in 
any path is the number of stages. The dependency with the number of stages 
can be more clearly appreciated in Figs. 7a and 7b. Note how incrementing the 
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Fig. 8. Results for Delaunay triangulation graphs. CPU time as a function of 
(a) the number of computed paths (with 10® nodes) and (b) the number of nodes 
(to compute 10^ paths) 



value of the number of stages does not affect the LVEA (which benefits from the 
efficient representation of alternative paths as sidetracks), while it clearly affects 
EA (which needs to build heaps for a larger number of nodes) . 

Finally, Figs. 7c and 7d represent the dependency of the running time with 
the input degree to compute 10® and 10"^ paths, respectively. In these figures the 
input degree ranges from 2 to 100, and this includes the evolution from Fig. 6c 
to Fig. 6d, as well as the behaviour for smaller values of the input degree. It 
can be seen that the time required by the REA and the LVEA only increases 
slightly when the input degree increases, while EA is clearly more affected by 
this parameter, so that the difference between the algorithms increases as the 
input degree increases. 

Results for Delaunay Triangulation Graphs. Delaunay triangulations are 
a particular kind of graph that we have chosen because they share with many 
real-world graphs, such as road maps, the fact that shortest paths tend to be in 
a certain region of the graph, in this case, close to the straight line connecting 
the origin and destination points. Nodes in other regions of the graph do not 
participate in the K shortest paths. Both the REA and the LVEA can take advan- 
tage of this fact and avoid building heaps associated to them. We used a graph 
generator that uniformly distributes a given number of points in a square, com- 
putes their Delaunay triangulation and assigns the Euclidean distance between 
the two joined points (nodes) to each arc. The initial and final nodes are located 
on opposite vertices of the square. 

The results are illustrated by Fig. 8. In this case, the lazy version of Eppstein’s 
algorithm is much faster than the original algorithm (which is far from being 
competitive) and slightly faster than the REA. 
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6 Conclusions 

The algorithm proposed by Eppstein to compute the K shortest paths between 
two given nodes in a graph is outstanding because of its low asymptotical worst- 
case complexity [6]. However, this algorithm includes a initial stage to build 
a graph D{G) from which the K shortest paths are then very efficiently com- 
puted, and the time required by this initial stage is considerable in practice, as 
the experimental results in Sect. 5 illustrate. For this reason, a simpler solution, 
the so-called Recursive Enumeration Algorithm (REA) [9] is significantly faster 
in many cases. In this paper, we have combined ideas from both algorithms to 
propose a lazy version of Eppstein’s algorithm that avoids building D(G) com- 
pletely when only a part of it is necessary. In this way, we maintain Eppstein’s 
worst-case complexity and achieve a considerable reduction in computation effort 
in practice, according to experimental results with different kinds of randomly 
generated graphs: the new version is not only faster than the original algorithm, 
but also faster than the REA in many cases. It might, therefore, be a useful prac- 
tical alternative. Our implementation of these algorithms is publicly available at 
http : //terra. act .uj i . es/REA. 
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Abstract. The problem to decide whether a graph is 3-colorable is NP- 
complete. We show that if G is a locally connected graph (neighborhood 
of each vertex induces a connected graph), then there exists a linear 
algorithm which either Hnds a 3-coloring of G, or indicates that such 
coloring does not exist. 



By Garey, Johnson and Stockmayer [2], it is an NP-complete problem to decide 
whether a graph is 3-colorable. This problem remains NP-complete for much 
smaller classes of graphs, as shown for example in [1, 2, 3, 5]. On the other 
hand, by Konig [4], a graph is 2-colorable if and only if it is bipartite. Thus 
there exists a linear algorithm which either finds a 2-coloring of a graph, or 
decides that such coloring does not exist. In this paper we show that similar 
algorithm exists for 3-coloring of locally connected graphs. 

We deal with finite graphs without multiple edges and loops. If G is a graph, 
then V{G) and E{G) denote the sets of vertices and edges of G, respectively. For 
every U C V{G) and A C E{G), denote by G[U] and G[A\ the subgraphs of G 
induced by U and A, respectively (note that G\U] and G\A\ are the maximal and 
minimal subgraphs of G satisfying V{G\U\) = U and E{G[A\) = A). For every 
V G P(G), denote by N{v) the set of neighboring vertices of v. We say that G 
is locally connected if G[iV(z;)] is connected for every v G V{G). G is uniquely k- 
colorable if there exists a unique partition of V{G) into k independent sets. 

We start with an auxiliary statement. 

Lemma 1. Suppose there is a quadruple (G, U, u, ip) such that G is a graph 
with m edges, U is an independent set of vertices from G, u € U , U U N(u) = 
V{G), ip is a mapping from U to {1,2,3}, p{u) p{x) for every x G U \ |m} 
which is not an isolated vertex of G, and each component of G[N(u)] contains 
a vertex adjacent with a vertex from U \ {m|. Then there exists an algorithm 
running in time 0(m) which either finds a mapping ip from N(v) to {1,2,3} 
such that tp and p form a S-coloring of G or indicates that such ip does not 
exist. Furthermore, if ip exists, then it is unique. 

Proof. We construct sets G C V{G) such that U Q C and a mapping pc from 
C\U to {1, 2, 3} such that p and pc form a 3-coloring of G[G]. Furthermore, pc 
is the unique mapping with this property. In the same time we color the edges 
from G — u such that the edges having exactly two ends from G are colored by 
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color 5, the edges having exactly one end in C are colored by color 4 and the 
rest of edges are colored by color 3. The edges of G having one end u are colored 
by color 2 if the second end is from C and by color 1 if the second end is not 
from C. The edge coloring is only an auxiliary process, which helps to count the 
number of steps. In each stage, we always increase the color of an edge if we use 
this edge during the process. Hence the algorithm has running time 0(m). 



Step 1: We set C := C/, tpc ■= 0, color the edges having one end in U \ {m} 
and {m} by colors 4 and 1, respectively, and the rest of edges by color 3. Go to 
Step 2. 



Step 2: Check whether there exists an edge colored by 4. If there is no such edge, 
then, since each component of G[N{u)] contains a neighbor of 17 \ {m}, we get 
that C contains all vertices from N{v), i.e., C = V{G) and thus ip = (pc has the 
desired properties. If there is an edge colored by 4, take its end v € N{u) \ C 
and consider the set Cy of vertices from C which are joined with v by an edge 
having color 4. If the vertices from Cy are colored by more than one color, ip 
does not exist because the vertices from Cy are not isolated in G, whence, by 
the assumptions of lemma, (p{x) ^ (p(u) for every x & Cy.li all vertices from Cy 
are colored by the same color a, then setting (p'{v) S {1,2,3}, (p'{v) yf a,(f{u) 
and (f'{x) = pc{x) for every x € C\U we get a mapping p' from C U {n| to 
{1, 2, 3} such that p and p' form a 3-coloring of G[C U jr'}]. Furthermore, p' is 
the unique mapping with this property (because pc is so and a ^ p{u))- Thus 
we can set G := G U {z;}, pc '■= p' , increase the numbers of colors of the edges 
incident with z; by 1, and repeat Step 2. □ 

Note that ip from Lemma 1 can be constructed in 0{m) time no matter on 
the cardinality of U (i.e, the number of isolated vertices of G). 

Theorem 1. Let G be a locally connected graph and m = max{|H(G)|, |if(G)|}. 
Then there exists an algorithm running in Opm) time which either finds a 3- 
coloring of G, or indicates that G is not 3-colorable. Furthermore, if G is 3- 
colorable, then every component of G is uniquely "i- colorable. 

Proof. We construct sets C C V{G), T C C and a 3-coloring pc of G[G] such 
that 

(1) G[G] is uniquely 3-colorable; 

(2) if z; e T, then N{v) C G; 

(3) if z; S G, then iV(z;) n T 0. 

In the same time we decompose E{G) into sets Ei, . . . ,E^ such that 

(a) E, = E{G-C); 

(b) E2 = E{G)\{E{G[C])UEi); 

(c) E 3 = E{G[C\T]); 
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(d) E 4 = E{G[C])\{E 3 UE{G[T])); 

(e) E, = E{G[T]). 

Notice that Ei, E 3 , E 5 are the sets of edges having both ends in V{G) \ G, 
G \T, T, respectively, and E 2 , E 4 are the sets of edges having exactly one end 
in G, T, respectively. For simplicity, we assume that the edges from Ei are colored 
by color i (i = 1, . . . , 5). We start the algorithm with G = T = %, E\ = E{G), 
E 2 = E^ = E 4 = E^ = % (i.e., all edges of G are colored by 1). Similarly as in 
Lemma 1, the edge-coloring helps to count he number of steps in the algorithm. 



Step 1: If V{G) = 0, we are ready. Otherwise choose v € V{G). If N{v) = 0 
color V by color 3, set G := G — v, and go to Step 1. If N{v) ^ 0, take v\ S N{v) 
and a bijective mapping ip : ^ {2,3}, and apply the algorithm from 

Lemma 1 for the quadruple (G[fV(?;) U {w}] — vvi,{v^vi}^v^ip), which satisfies 
the assumptions of this lemma. We get either a (unique) 3-coloring ip of G[fV(?;)U 
{r>}], or G[N{v) U {r>}] and G are not 3-colorable. In the first case set T := {w}, 
G ■= N{v) U {?;}, ifc '■= V'j (which satisfy (1) - (3)), color the edges of G[G] 
following the rules (a) - (d), and go to Step 2. 



Step 2: Check whether G = T. If G = T, then by (1) - (3), G[G] is a (uniquely) 
3-colorable component of G with 3-coloring ipc- Thus we set G .= G — C and 
go to Step 1. If G yf T, then go to Step 3. 



Step 3: Choose v G G \ T. Now we need to check whether G' = G U N{v) 
and T' = T U {w} satisfy (1) - (3). First take edges ei = vvi, . . . , e„ = vv^, 
n = |A^(i;)|, incident with v. 

For i = 1, . . . ,n, do the following. If has colors 2, 3 or 4, replace it by 
colors 4, 4 or 5, respectively. If Vi G G, then Vi is already colored. If Vi ^ G, then 
consider the edges = ViVtp, . . . , = ViVi,ni incident with Vi and different 

from ei. If has colors 1 or 2, replace it by colors 2 or 3, respectively. (Note 
that if joins two vertices from N{v) \ G, then we increase its color twice, 
from 1 to 2 with respect to one end and then from 2 to 3 with respect to the 
second end.) 

Let E[ be the graph arising from G[N{v) U {?;}] after deleting all edges be- 
longing to G[G] and adding the (isolated) vertices from G \ {N{v) U {w}). Then 
E{H) is the set of edges of G for which we have increased a color from (1, 2} to 
a color from {3,4} and the quadruple {H,C,v,ipc) satisfies the assumptions 
of Lemma 1. Apply the algorithm from Lemma 1 for this quadruple. After 
0{\E{H)\) steps, either we extend the 3-coloring ipc to a 3-coloring ipc of G[G'], 
or we show that G[G'] and G are not 3-colorable. If G[G'] is 3-colorable, then 
(by Lemma 1 and since G and T satisfy (1) - (3) and G[A^(r>)] is connected) 
also G' and T' satisfy (1) - (3). Set G := G', T := T', ipc '■= ipc- The new 
colors of the edges of G satisfy conditions (a) - (e) for the new sets G and T. 
Go to Step 2. 



194 



Martin Kochol 



We finish the algorithm when we get V{G) = 0 or when we show that G is 
not 3-colorable. Furthermore, every component of G is uniquely 3-colorable if G 
is 3-colorable. In Step 3, the number of processes is a multiple of the number 
of edges for which we have increased the color. The same holds for Step 1 if 
the chosen vertex v is not isolated. In Step 2, we only delete (or separate) the 
vertices which are already colored. Similarly in Step 1 if the chosen vertex v is 
isolated. Thus the algorithm has running time 0{m). □ 

Theorem 1 cannot be generalized to fc-colorability for fc > 3 unless P = NP. 
To show this, construct for any connected graph G a locally connected graph G„, 
n > 1, adding to G a copy of Kn and joining every vertex from G with every 
vertex from Kn- Then G„ is n -|- r-colorable iff G is r-colorable. Hence, by [2], 
the problem to decide whether a locally connected graph is fc-colorable is NP- 
complete for every fix fc > 3. 
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Abstract. An interval graph is the intersection graph of a collection of 
intervals. One important application of interval graph is physical map- 
ping in genome research, that is, to reassemble the clones to determine 
the relative position of fragments of DNA along the genome. The linear 
time algorithm by Booth and Lueker (1976) for this problem has a se- 
rious drawback: the data must be error-free. However, laboratory work 
is never flawless. We devised a new iterative clustering algorithm for 
this problem, which can accommodate noisy data and produce a likely 
interval model realizing the original graph. 

1 Introduction 

An interval graph is the intersection graph of a collection of intervals. This class 
of graphs has a wide range of applications. An important application of interval 
graphs is the construction of physical maps for the genome research. Physical 
maps are critical in hunting for specific genes of interest, and also useful for 
further physical examination of DNA required for other genome project. The 
term “physical mapping” means the determination of the relative position of 
fragments of DNA along the genome by physicochemical and biochemical meth- 
ods. The construction of physical maps is generally accomplished as follows. Long 
DNA sequences are broken to smaller fragments, and then each fragment is repro- 
duced into the so-called clones. After deciding some fingerprints for each clone, 
two clones are considered overlapping if their fingerprints are sufficiently similar. 
Finally, information on pairwise overlapping determines the relative positions of 
clones, thus completing the construction of physical maps [3, 4, 6, 9, 17, 19, 20]. 

The error free version of the mapping problem can be modeled as an inter- 
val graph recognition problem: given a graph G = (V,E), finding a family of 
intervals such that each interval corresponding to one vertex of the graph, and 
two vertices are adjacent if and only if their corresponding intervals are over- 
lapping [2, 10, 11, 12]. However, data collected from laboratories unavoidably 
contain errors, such as false positives (FPs, two overlapping clones are actually 
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non-overlapping) and false negatives (FNs, two non-overlapping clones are actu- 
ally overlapping). Because a single error might cause the clone assembly to fail, 
traditional recognition algorithms can hardly be applied on noisy data directly. 
Moreover, no straightforward extension of traditional algorithm can overcome 
the drawbacks. 

Four typical models have been proposed for dealing with errors. The defini- 
tions are as follows: 1) interval graph completion problem: assume the input data 
only contain FNs and minimize the number of edges whose addition makes the 
graph an interval [13, 15, 18]. 2) interval graph deletion problem: assume there 
are only FPs in the input data, and minimize the number of edges whose dele- 
tion makes the graph an interval graph [7]. 3) interval sandwich problem: assume 
that some pairs of clones are definite overlaps, some are definite non-overlaps, 
and the rest are unknown, then construct an interval graph under these over- 
lapping constraints [8, 14]. 4) internalizing k-color graph problem: assume that 
clones are created from k copies of DNA molecule, and some pairs of clones 
are definite overlaps. The objective is to generate a k-colorable interval graph 
with the overlapping conditions [1, 5, 7, 8]. However, the above models suffer 
from the following two unpleasant phenomena: 1. all of the above models have 
been shown to be NP-hard [5, 7, 8, 22], and it would be difficult to define an 
associated “single objective optimization problem” for approximation due to the 
errors could be intertwined together; 2. even if one can find the best solution, 
this solution might not make any biological sense. 

To cope with this dilemma, consider the nature of error treatment. Gener- 
ally, data collected in real life contain a small percentage of errors. Suppose the 
error percentage is 5% with carefully control. The challenge is thus to discover 
the 95% correct information versus the 5% incorrect information automatically. 
We designed an algorithm to deal with errors based on local structure matching. 
The idea is very similar to the one employed in [16]. Our philosophy is that, in 
order to determine whether certain overlapping information is valid or noisy, we 
check the neighborhood data to see if it conforms “approximately” to a partic- 
ular local structure dictated by the problem. The probability that an isolated 
piece of spurious information has a well-behaved neighborhood structure is nil. 
More precisely, in our analysis, if there is enough valid information in the in- 
put data, then a certain monotone structure of the overlapping information on 
the neighborhood will emerge, allowing us to weed out most errors. We do not 
set any “global” objective to optimize. Rather, our algorithm tries to maintain 
a certain “local” monotone structure, namely, to minimize the deviation from 
the local monotone structure as much as possible. 

The kind of error-tolerant behavior considered here are similar in nature 
to algorithms for voice recognition or character recognition problems. Thus, it 
would be difficult to “guarantee” that the clustering algorithm always produces 
a desirable solution (such as one that is a fixed percentage away from the so- 
called “optimal solution”); the result should be justified through benchmark 
data and real life experiences. Our experimental results show that, when the 
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error percentage is small, our clustering algorithm is robust enough to discover 
certain errors and to correct them automatically most of the time. 

The remaining sections are organized as follows. Section 2 gives the basic 
definitions of some notations. An interval graph test based on [16] is discussed 
in Section 3, which forms the basis of our clustering algorithm. Section 4, the 
main part of this paper, illustrates how to deal with errors in the input data. The 
experimental results are shown in Section 5. Section 6 contains some conclusion 
remarks. 

2 Basic Definitions 

In this paper all graphs are assumed to be undirected, simple, and finite. For 
a graph G = {V, E), denote its number of vertices by n and its number of edges 
by m. Given a vertex u in G, define A^[rt] to be the set of vertices including u 
and those vertices adjacent to u in G; define N{u) to be Af[u] — u. For some 
subset M of V, define N{M) be the set of those vertices that are not in M but 
adjacent to some vertices in M. Thus, we have Af(A^[u]) = {x\x is not in Af[tt] 
but adjacent to some vertices in A^[m]}, which is the second-tier neighborhood in 
a breadth-first-search from u. This kind of neighborhood plays a crucial role on 
our clustering analysis. We define relations between two adjacent vertices using 
the above set of neighbors. Two adjacent vertices u, v in G are said to be strictly 
adjacent (STA), if none of A^[rt] and Af[r;] is contained in the other. We denote 
the set consists of those vertices strictly adjacent to rtby STA{u). A vertex u is 
said to be contained in another vertex v, if N[u] is contained in N[v]. 

Each interval graph has a corresponding interval model in which two intervals 
overlap if and only if their corresponding vertices are adjacent^. However, the 
corresponding interval model is usually far from unique, because of variations 
of the endpoint orderings. To obtain the unique interval model representation, 
consider the following block structure of endpoints: Denote the right (resp. left) 
endpoint of an interval u by R(u) (resp. L{u)). In an interval model, define 
a maximal contiguous set of right (resp. left) endpoints as an R-block (resp. 
L-block). Thus, the endpoints can be grouped as an alternating left-right block 
sequence. Since an endpoint block is a set, the endpoint orderings within the 
block are ignored. The overlapping relationship remains unchanged if one per- 
mutes the endpoint order within each block. Denote the right block containing 
R{u) by Br{u), the left block containing L{u) by Bl{u), and the set of block 
subsequence from Bl{u) to Br{u) by [Bl{u), Br{u)\. An endpoint R{w) (resp. 
L{w)) is said to be contained in an interval u if Bl{w) (resp. Bn{w)) is contained 
in [Bl{u),Br{u)]. 

Let G be an interval graph. Consider an interval model for G. For an inter- 
val u, the neighborhood of u can be partitioned into A(m), B{u), C{u), D{u), 
where A{u) consists of those intervals that strictly overlap u from left side; B{u) 
consists of those intervals that strictly overlap u from right side; C{u) consists 

^ For convenience, we shall not distinguish between these two terms, “vertex” and its 
corresponding “interval” . 
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Fig. 1. An example of A{u), B(u), C{u),D{u),LL{u), and RR{u) 



of those vertices that properly contain u; D{u) consists of those vertices that 
are properly contained in u. We call these sets A{u), B{u), C{u), D{u), the left 
neighborhood, the right neighborhood, the outer neighborhood, and the inner 
neighborhood. On the other hand, the second-tier neighborhood of u can be par- 
titioned into LL{u) and RR{u), where LL{u) consists of those intervals that are 
completely to the left of u and overlap some neighbors of u; and RR{u) consists 
of those intervals that are completely to the right of u and overlap some neigh- 
bors of u. We call LL{u) the left second-tier neighborhood and RR{u) the right 
second-tier neighborhood. An example of A{u), B{u), C{u), D{u), and RR{u) 
is shown in Figure 1, where A{u) = {4,5}, B{u) = {9,10}, C{u) = {6,7}, 
D{u) = {8}, LL{u) = {1, 2, 3}, and RR{u) = {11, 12}. 

3 An Interval Graph Test 

To our best knowledge, no straightforward extension of existing linear time 
algorithms can handle errors. The idea of [16], however, can be modified to 
yield a clustering version that can deal with noisy data. In this section, we de- 
scribe a quadratic time interval graph test, which adopts some techniques similar 
to [16]. Notably, the time complexity is not a major concern for algorithms on 
noisy data. 

The basic idea of this algorithm is very simple: The vertices are processed 
one by one according to an ascending order of their degree. For each vertex u, 
we decide the unique left-right block sequence that records the relative positions 
of endpoints within u, based on a robust local structure on its neighbors. If the 
unique left-right block sequence within u intersects other existing left-right block 
sequences, all the left-right block sequences are further merged into a new left- 
right block sequence. Finally, if graph G is an interval graph, after all vertices 
have been processed, we will obtain the unique left-right block sequence that 
realize graph G; otherwise, the algorithm will terminate in some iteration due 
to the failure of left-right block sequence construction. 

For each vertex u in G, our algorithm performs three main steps: 1) neighbor- 
hood classification, 2) block sequence determination, and 3) vertex replacement. 
The first step, neighborhood classification, classifies vertices adjacent to u into 
A(it), B{u), C{u) and D{u). Since the block sequence within u relates to a robust 
local structure on A{u) and B{u), this classification is significant for our interval 
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graph test. The second step, block sequence determination, decides the unique 
left-right block sequence within u according to a monotone structure on A{u) 
and B(u), and merge this block sequence with another existing block sequence, 
if necessary. The last step, vertex replacement, generates a “special vertex” 
which is adjacent to all neighbors of u and special vertices strictly adjacent to u. 
We shall associate with the corresponding left-right block sequence of u con- 
structed in the second step. Remove vertices whose endpoints are both contained 
in the block sequence of u®, and delete all edges between A(u®) and B(m®), since 
information about those deleted edges and vertices is no longer needed. After 
vertex replacement, the graph is further reduced. 

The main iteration of our interval graph test is described in Algorithm 1 . 
The following definitions are needed to describe the algorithm. 

Definition 1 . A collection of sets is said to be monotone if every two sets Si, Sj 
in the collection are comparable, that is, either Si D Sj or Sj C Si. 

Definition 2 . A interval u is said to be compatible with a left-right block se- 
quence LB\, RBi, LB2, RB2, ■ ■■ , LBd, RBd if the left {resp. right) endpoints 
within u are contained in LBi, LB2, ■■■, LBd {resp. RBi, RB2, ..., RBd), 
and let RBj^ {resp. LBj.^) be the leftmost R-block {resp. rightmost L-block) hav- 
ing nonempty intersection with endpoints within u, all blocks in between {but 
excluding) RBj^ and RBj^ are contained in N{u). 

Algorithm 1 The Interval-graph-test: Processing an original vertex 

1. Neighborhood Classification: 

1.1 Construct the following set: C{u) <— D N(u)}, D{u) <— C N(u)} 

and STA{u) ^ N{u) - C{u) - D{u). 

1.2 Partition STA(u) into A{u) and B{u): 

(1) Let u* he a vertex in STA{u) with the largest \N(u*) fl {N(STA{u)) — N(u))\. 

(2) LetLL{u) ^ {w\w G N{u^)niN{STA(u))-N{u))}, and RR{u) ^ N(STA(u))- 
N{u) - LL{u). 

(3) Let A{u) ^ STA{u) D N{LL(u)) and B(u) ^ STA{u) - A{u). 

1.3 Let usL the special interval such that usl ^ ugR be the special interval 

such that usR E B(u). 

2. Block sequence determining: 

2.1 Find the collection of sets {N{w) D B(u)\w G A('u)}. 

2.2 Check the following: 

(1) The collection {N{w) fl B{u)\w G -A(u)} is monotone such that the right end- 
points of intervals in A(u) and the left endpoints of interval in B{u) can be 
uniquely partitioned with R{usl) located on the first R-hlock and L{usji) lo- 
cated on the last L-block. 

(2) Every interval in D{u) is compatible with the block sequence determined by the 
above two sets and the remaining intervals in D(u). 

2.3 If there is any violation, G is not an interval graph and the test is terminated 

3. Vertex replacement: 

3.1 Create new special interval us with N(u^) <— N(usl) fl N(u) fl N(us^). 

3.2 Suppose that x is a vertex with its right endpoint in us but not its left endpoint, and 
y is a vertex with its left endpoint in us but not its eight endpoint. Remove edge {x, y) 
if it exists. 

3.3 Remove u, ugL usr and vertices whose left endpoints and right endpoints are 
both contained in . 
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Fig. 2. An example of the Interval- graph-test 



An example of the Interval- graph-test is shown in Figure 2. The left half 
of Figure 2 is the interval graph at the beginning of the iteration that inter- 
val u is processed. Intervals usl and usr are the two special intervals strictly 
overlapping u. The corresponding block sequence of usl is {L(usl)}, {.R(1)}j 
{T( 4),T(5)}, {i?(2), i?(3)}, {L{6),L{u)}, {R{usl)}, and the corresponding 
block sequence of usl is {L{usr)}, {R{7), R{8)}, {L(10), L(11)}, i?(u)}, 

{T(12)}, {R{usl)}- In neighborhood classification, the neighborhood of u is 
classified into A{u) = {14,15}, B{u) = {13} and C(u) = {16}, and D{u) = 
{4, 5, 6, 7, 8, 9}. Based on the monotone property of A{u) and B{u), as well as the 
compatible property of D{u) and the block sequence decided by A{u) and B{u), 
we can obtain the block sequence within u, say {L{u)}, {R{usl)}, {B{9) , L{13)} , 
{i?(6)}, {L(8)}, {A(4), i?(5)}, {L(7)}, {i?(14)}, {L{usr)}, {R{u)}. In the vertex 
replacement step, do the following: 

1. Create a new special interval with iV(M®) = N{usl) U N{u) U N{usr) = 
{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}, and associates rt® with the 
block sequence that records the relative positions of endpoints within it'’. 

2. Delete all edges connecting vertices in {4, 5, 6, 14, 15} and {7, 8, 9, 13}. 

3. Remove intervals usl, usr, and intervals contained in if” (namely, intervals 
4, 5, 6, 7, 8, and 9). 

At the end of this iteration, the corresponding interval graph becomes the 
right half of Figure 2. 

We can prove that our algorithm decides whether a graph G is an interval 
graph or not correctly based on the following lemmas and theorems (the details 
are omitted). 

Lemma 1 . If graph G is an interval graph, then the collections {N(w)nB{u)\w€ 
A(it)} and {N(w) H A{u)\w € B{u)}are both monotone. 

Lemma 2. If {N{w) H B{u)\w S A(m)} is monotone, then the right endpoints 
in A{u) and the left endpoints in B(u) can he partitioned into LB 2 , ■ ■ ■ ,LBn and 
RBi, RB 2 , ■ ■ ■ , RBn-i respectively, such that the LB\, RB\, . . . , LBn, RBn is 
the left-right block sequence within u, where LB\ = {L(u)'\and RBn = {i?(it)}. 

Theorem 1 . A graph is an interval graph iff the following conditions hold for 
each iteration of the interval- graph-test algorithm: 
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1. The collection of set {N(w) H B{u)\w € ^(u)} is monotone and the right 
endpoints of A(u) and the left endpoints of B(u) can be uniquely partitioned 
with R{usl) located on the first right block and L{usr) located on the last 
left block. 

2. Every interval in D(u) is compatible with the block sequence determined by 
the above two sets and the remaining intervals in D(u). 

If the given graph G is an interval graph, then the proposed algorithm will 
yield an interval model of graph G, otherwise, the algorithm will terminate in 
step 2 of the Interval-graph-test. 



4 Treating the Errors 

In this section, we present the clustering version of interval graph testing. The 
method to perform neighbor classification on noisy data will be discussed in 
Section 4.1. Section 4.2 illustrates the block sequence determining while taking 
FNs and FPs into account. The complete clustering version of interval graph 
test (under noise) is summarized in Section 4.3. 

4.1 The Error- Tolerant Neighborhood Classification 

If the input data contain errors, it is more intricate for neighborhood classifica- 
tion. However, based on clustering analysis on the neighborhood and second-tier 
neighborhood of rt, we are able to classify neighbors of interval u roughly into 
four sets A{u), B{u), C(u), and D(u). Our strategy is to classify the second- 
tier neighbors of u, Af(Af[u]), into LL(u) and RR{u) first, and then classify the 
neighbors of uinto A{u), B{u), C{u) and D{u) based on LL{u) and RR{u). Let 
OV{w,v) = |N[r(;] n N[v]\ denote the overlap function between two intervals w 
and V. The overlap function is used to measure the degree of overlapping for each 
pair of intervals in Af(N[u]). The clustering of LL{u) and RR{u) uses a greedy 
strategy based on the overlap function. The classification of LL{u) and RR{u) 
is described in Algorithm 2 below. 

Algorithm 2 The LL-RR-classification Algorithm 

1. For each interval in N{N[u]), associate it with a cluster that consists of that interval. 

2. Calculate OV(w,v) for each pair of intervals in N{N[u]). 

3. Select a pair of intervals such that these two intervals are in different clusters and attain 
the highest OV{u,v) value. Merge the corresponding clusters of w and v into one cluster. 

4 . Reiterate Step 3 until all there are two clusters. 

5. Let one cluster be LL{u) and the other be RR{u). 

An example for explaining neighborhood classification is shown in Figure 3. 
In this case, the input data are noisy, but we can only depict part of errors in this 
figure. The solid lines and the dotted lines represent intervals overlapping u and 
those not overlapping u in the input data, respectively. All intervals depicted are 
located at the original “correct” position. Thus, 4 overlaps u originally but does 
not overlap u in the input data due to FN, and 10 does not overlap u originally 
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Fig. 3. An example for neighborhood classification 



but overlap u in the input data due to FP. Furthermore, assume that there are 
FPs between interval 1 and 11 and between 2 and 11. At the first iteration, 
merge the corresponding clusters of 1 and 2 into one cluster, since OV(l, 2) = 4 
is the highest. At the second and the third iterations, merge the corresponding 
clusters of 2 and 4, and the corresponding clusters of 9 and 11, respectively. 
Finally, we have that LL{u) = {1,2,4} and RR{u) = {9, 11}. 

We classify SAT{u) into A{u) and B{u) using the simple heuristic rule: in- 
tervals in A{u) should not overlap any interval in and intervals in B{u) 

should not overlap any interval in LL(u). Thus, the overlapping relation between 
A{u) and RR{u), and between B{u) and LL{u) could be considered as FPs. For 
each interval w in SAT(u), classifying w into A{u), if the number of FPs due to 
classify w into A{u) less than the number of FPs due to classify w into B{u), 
we conclude that w is in A(m), on the contrary, we conclude that w is in B{u). 
Such a classification scheme is summarized in Algorithm 3. 

Algorithm 3 The A- B- classification Algorithm 

1. Calculate the error functions of w as follows: 

Ba{w) |{(ty,i;)|u G N{w) n RR{u)}\ 

Eb{w) <— G N{w) D LL{u)}\. 

2. Classify w into A{u), B(u): 

If Ea{w) < Esiw) then classifying w into A{u) 
else then classifying w into B(u) 

In the example of Figure 3, Ea{A) = 0, Eb{^) = 3, Ea{5) = 0, Eb{5) = 
2, Ea{6) = 1, Eb{6) = 1, Ea{7) = 1, Eb{7) = 0, Ea{8) = 2, Eb{8) = 
0, F^(IO) = 2, Ab( 10) = 0. Thus, A{u) = (3, 5}, B{u) = (6, 7, 8, 10}. 

The above sets A(m), B{u), LL{u) and RR{u) could still be misclassified due 
to those FPs and FNs related to interval uitself. To prevent this kind of errors 
(or to minimize its effect), we shall reclassify intervals currently in LL{u) U A{u) 
into new LL{u) and A{u) as follows (The reclassification of RR{u) U B{u) into 
new RR{u) and B{u) can be done similarly). Denote LL(u)U A(u) by L-part(u), 
and RR{u)UB{u) by R-part{u). To reclassify intervals currently in LL(u)U A(m) 
into new LL{u) and A{u), it suffices to determine the location of L(u). 

Once L{u) is located, then those intervals of L-part{u) whose right endpoints 
are to the right (resp. left) of L{u) are considered neighbors, A{u) (resp. non- 
neighbors, LL{u)), of u. We shall locate the right endpoint of intervals in L- 
part{u) first, and then decide the position of L{u). To do that, we need to 
determine the relative positions among right endpoints of intervals in L-part(u). 
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Interestingly enough, R-part{u) will play an important role in this process based 
on the following simple lemma. 

Lemma 3. Let S and T be two sets of intervals. If the right iresp. left) endpoint 
of every interval in T is to the right {resp. left) of the right {resp. left) endpoint 
of every interval in S, then the right {resp. left) endpoint of interval w in S with 
the largest |Af(ui) H T\ value is the rightmost right endpoint {resp. leftmost left 
endpoint) among all right {resp. left) endpoints of intervals in S. 

Based on Lemma 3, we shall order right endpoints of intervals in L-part{u) 
from right to left iteratively as follows. Initially, set S to be L-part{u) and T 
to be R-part{u). Note that S and T will be changed at each iteration. We shall 
maintain that the right endpoint of every interval in T is to the right of the 
right endpoints of intervals in S. Thus, we shall make the right endpoint of 
interval w\i^-part(u)\ in S with the largest \N {w\i^-part(u)\) T\ value to be the 

rightmost right endpoint of intervals in S (= L-part{u)). Next, delete inter- 
val w\i^-part(u)\ from S and add W|L-part(u)| into T. Now, make the right endpoint 
of interval iC|L-pori(u)-i| in the resultant S with the largest \N{w 2 ) H T\ value 
to be the rightmost right endpoint of intervals in the remaining S (= L-part{u)~ 
{w\L-part(u)\}) ■ ThuS, .R(u7|L-pari(u) — 1 | ) i® the left of | ) 7 but tO 

the right of all right endpoints of other intervals in L-part{u). Then, delete in- 
terval W|L-part(u)-i| fi'om S and add W|L-port(u)-i| into T. Reiterate the above 
process until all right endpoints of intervals in L-part{u) have been ordered. 

After that, call the ordered right endpoints of intervals in L-part{u) from left 
to right as R{wi), R{w 2 ), . . ■ , R{w\L-part{u)\)- But, in some cases due to noise, 
although |Af(cc) n T| > \N{y) n T\ (which would entail that R{x) is to the right 
of R{y)), R{x) might be, in fact, to the left of R{y). However, if the error rate 
is quite small, (say, no more than 5%), we can expect that R{x) will be ordered 
to the right of R{y) with high probability. Thus, we can obtain the approximate 
ordering of the right endpoints of intervals in L-part{u). Similarly, we can also 
order left endpoints of intervals in R-part{u) form left to right as L{vi), L{v 2 ), 

. . . , ) . 

To decide the position of L{u), we calculate the “cost” of L{u) for each posi- 
tion that L{u) could be placed. If L{u) is placed between R{wi) and R{wi+i), u 
must overlap all intervals wj with j > i, otherwise {u,wj) is a FN. Moreover, 
intervals Wj and Wk such that j, k > i must overlap each other, otherwise 
{wj,Wk) is a FN. On the other hand, intervals Wj with j < i should not over- 
lap u, otherwise {wi,u) is a FP. Let ErrL{u,i) be the total number of FNs 
and FPs, if L{u) is placed between R{wi) and R{wi+i). Thus, ErrL{u,i) = 
\{{u,Wj)\{u,Wj) ^ E{G) and j > i}| -|- \{{wj,Wk)\{wj,Wk) ^ E{G) and j, fc > i}| 
-I- |{(wi, M)|(rci, u) S E{G) and j < i}|. Note that ErrL{u,0) is defined as the 
total number of errors that place L{u) to the left of all the right endpoints of 
intervals in L-part{u). We conclude that L{u) should be placed between R{wi) 
and i?(rci+i), if ErrL{u, i) is the minimum among all of the ErrL values. Similar 
strategy can be used to decide the position of R{u). The heuristic to distinguish 
neighbors and non-neighbors of u is described in Algorithm 4. 
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Algorithm 4 The Neighborhood-decision Algorithm. 

1. Order the right endpoints of intervals in L-part{u) from left to right as R{wi), R{w 2 ), 

. . K(«'|i-part(u)|) as follows : 

1.1 Let S be L-part{u) , T be R-part{u), and i be \L-part(u)\. 

1.2 Let w* be an interval in S with the largest |A’(ty*)nT|, and denote L(w*) by L{wi). 

1.3 Delete Wi from S and add Wi into T. 

1.4 Decrease i by 1. 

1.5 Reiterate Step 1.2 to Step I. 4 , until S is empty. 

2. For 0 < 2 < I L-part{u)\, let ErrL{u,i) = \{{u,Wj)\{u,Wj) ^ F{G) and j > 2 }| + 

, Wk)\{wj,Wk) ^ -F(G) and j,k > i}\ + u)| u) E E{G) and j < 2 }| . 

3. If ErrL(u,t) is the minimum among all ErrL’s, we conclude that L(u) should be placed 
between R{wt) and R(wt+i). Let A(u) = {wi\i > t} and LL(u) = {wj\j < t}. 

4- Order the left endpoints of intervals in R-part{u) from left to right as L(v\), L{v 2 ), . . . , 
^i'^\R-part{u)\) follows: 

4-1 Let S be R-part{u), T be L-part{u), and i be 1. 

4.2 Let V* be an interval in S with the largest \N(w*) flTj, and denote L(v*) by L(wi). 

4.3 Delete vi from S and add Vi into T. 

4.4 Increase i by 1. 

4.5 Reiterate Step 4-2 to Step 4-4} until S is empty. 

5. For 0 < 2 < I R-part{u)\, let ErrR{u,i) = \{{u.,Vj)\{u,Vj) ^ F{G) and j < 2 }| + 

^ E{G) and j,k < i}\ + u) e -B(G) andj>i}\. 

6. If ErrR{u,t) is the minimum among all ErrR’s, we conclude that R{u) should be placed 
between L{vt) and L(vt+i), and let B{u) = {vi\i < t} and RR(u) = {vj\j > t}. 

For example, in Figure 3, we can order right endpoints of intervals in L- 
part{u) from left to right as i?(2), i?(5), i?(4), i?(3), and order left end- 

points of intervals in R-part{u) from left to right as -2(7), -2(6), -2(8), L{9), 
L(10), L{11). Furthermore, ErrL{u,2) = 1 is the minimum among all ErrL’s, 
and ErrR(u, 3) = 1 is the minimum among all ErrR’s. Thus, we conclude that 
L{u) should be located between R{5) and R{2), and R{u) should be located be- 
tween L(8) and L{9). Hence, L-part{u) and R-part{u) could be reclassified into 
LL{u) = {1, 2}, A{u) = {3, 4, 5}, B{u) = {9, 10, 11}, RR{u) = {6, 7, 8}, and the 
FPs and FNs relative to uitself have been corrected. 



4.2 Deciding Endpoint Block Sequence under the Influence of FNs 
and FPs 

In this section we determine the left-right block sequence within u on noisy 
data. The monotone collection {N{w) n N{u)\w G ^(w)| provides a very strong 
structural property for interval graphs. This property is stable enough for us to 
obtain a “good” left-right block sequence within interval u. In case the above 
collection of sets does not satisfy the monotone property, one could remove 
some elements and/or add some elements into the sets to make it satisfy the 
monotone property. We denote the removed elements as removals and the added 
elements as fill-ins. The removals and fill-ins could be considered as FPs and FNs, 
respectively. Note that it is a relative matter to decide removals and fill-ins, and 
there is a trade-off in determining FPs and FNs. Suppose we suspect an edge to 
be a FP. There are two possibilities. One is that we simply remove this edge. The 
other is that we let it stay, which would possibly create some FN(s) we need to 
fill in to preserve the monotone property. Our strategy is to detect and remove 
potential FPs first, and then deal with the FNs. Note that the minimum fill-in 
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problem is NP-complete [22] and a polynomial approximation for the problem 
has been proposed in [18]. 

We use the FP-Screening algorithm in Algorithm 5 to determine a FP. 
Let wi, W 2 , te|A(u)| be a list in A{u) ordered according to their ascend- 
ing |Af(w) n B{u)\ values. If {N{w) n B{u)\w G A{u)} is monotone, we should 
have N{wi) n B{u) C N{wj) n B{u) for all i < j. Since data is noisy, this might 
not hold for all i < j, but it should hold with high probability due to low error 
rate. So for each v € N{wi)C]B{u), if \{j\j > i and v € N{wj)C]B{u)}\ > 3, the 
entry {wi , v) is considered a FP. The threshold is set to be 3 since the probability 
that there are more than three FPs in the same interval is relatively low. 

Algorithm 5 The FP-screening Algorithm 

1. Sort intervals inA(u) into a list according to their ascending \N(w)n 

B{u)\ values. 

2. For each w S A(u), if \{j\i < j and v S N(wi) n B(u) and v ^ N(wj) n B(u)}|, the pair 
of intervals (wi,v) is considered a FP. Remove edge (wi,v). 

After the FPs are determined and removed, we determine fill-ins that make 
the collection {N{w) n B{u)\w € A(m)} monotone using the following greedy 
strategy shown in Algorithm 6. Initially, consider all intervals in A(u) unselected. 
For each unselected interval w in A(u), define its “fill-in cost” to be the minimum 
number of edges whose addition will satisfy N{w) n B{u) C N{w') n B{u) for 
every unselected interval w' in A(u), namely, define fill-in{w) = |{(w',r;)|?; G 
N{w) n B{u) and v ^ N{w') n B{u) for all w' G A(u), w' is unselected, and 
w' yf: re}]. Each time, select the interval, say w* , with the minimum “fill-in 
cost” among unselected intervals in A{u). Once w* has been selected, adding all 
edges counted in fill-in{w*) and mark w* a selected interval. Reiterate the above 
process until all intervals in A{u) are selected. 

Algorithm 6 The FN-screening Algorithm 

1. Select an unselected interval w* in A{u) with the minimum |fill-in(ti;*)| value. 

2. Consider each element {w' .,v) counted in |fill-in(it;*)| as a FN and add edge (w',v). 

3. Mark w* a selected interval. 

4- Reiterate Step 1 to Step 3 until all intervals in A(u) have been selected. 



4.3 The Clustering Version of Interval Graph Test 

Finally, we summarize the algorithms of these section in Algorithm 7 below. The 
intervals are processed according to an ascending order of their degrees. 



Algorithm 7 The Interval-graph-clustering-test 

1. Neighbor Classification: 

1.1 Let C{u) ^ D iV(w)}, D{u) ^ C iY(u)} and STA{u) ^ N{u) - 

C(u) - D{u). 

1.2 Construct LL{u) and RR{u) using the LL-RR-classification algorithm. 

1.3 Partition STA(u) into A{u), B(u) using the A-B-classification algorithm. 

1.4 Distinguish the neighbors and non-neighbor of u using the Neighborhood- decision 
algorithm. 

1.5 Let usL be the special interval in A(u), and ugR be the special interval in B{u). 

2. Block sequence determination 

2.1 Screen out FPs using the FP-screening algorithm. 
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2.2 Fill in FNs using the FN-screening algorithm. 

2.3 Construct the left-right block sequence within u using the collection {N(w)DB{u)\w E 
A(u)} and intervals in D(u). 

3. Vertex replacement: 

3.1 Create a new special interval with N{u^) <— N{usl) U N{u) U N(usji). 

3.2 Suppose that x is a vertex only with its right endpoint in and y is a vertex only 
with its left endpoint in . Remove edge (x,y) if it exists. 

3.3 Remove u, ugi, and ugji and vertices whose left endpoints and right endpoint are 
both contained in . 

5 Experimental Results 

We conduct experiments based on synthetic data. We start with a fixed interval 
model and, in each experiment, randomly generate errors on the edge connec- 
tions, then feed the resultant graph to our algorithm to get a left-right block 
ordering. Three fixed graphs of sizes 100, 200, and 400 are used. These graphs 
are generated randomly under the constraint that the number of endpoints an 
interval contains (roughly corresponds to its “coverage”) ranges from 5 to 15. 
The combined error rates of FPs and FNs are set to be 3%, 5% and 10%, re- 
spectively. Within each error percentage, set the ratio of the number of FPs 
and that of FNs to be 1 to 4, namely, every generated FP accompanies 4 FNs. 
For various combination of graph size and error rate, we repeat the experiment 
50 times using different random seeds. The results are evaluated by comparing 
the resultant interval ordering from that of the original ordering, based on the 
measurement defined below. 

Regard the position of an interval as the position of the “left endpoint” of 
the interval. For an interval u, let di be the number of intervals ordered to the 
left of u but whose indices are greater than u and d, 2 , the number of intervals 
ordered to the right of u whose indices are less than u. Let the displacement 
d{u) of interval u be the larger of d± and d 2 - The displacement d{u) gives an 
approximate measure of the distance of interval u from its “correct” position. It 
should be noted that defining an exact measure is difficult here since many other 
intervals have to be moved simultaneously in order to place a particular interval 
“correctly” . We use the following criterion for measuring the total deviation of 
the resultant ordering from the original one: If the displacement of an interval u 
is more than 4, we say m is a jump interval, which means that the position of u 
is quite far from its ordinary position. For example, in Figure 4, d(2) = 6 (there 
are 6 intervals ordered to the left of interval 2 whose indices are greater than 2), 
d{6) = 1, and d{8) = 6 (there are 6 intervals ordered to the right of interval 8 
whose indices are less than 8). Thus, interval 2 and interval 8 are jump intervals. 

We measure the performance of our algorithm by counting the number of 
jump intervals in the resultant interval model. Table 1 lists the statistics for 
the number of jump intervals. As one can see, when the error rate less than 
5%, the number of jump intervals is less than 10, and even when the error rate 
increase to 10%, the number of jump intervals remains less than 15 in most runs, 
indicating that the final interval ordering produced by the algorithm is a good 
approximation for the original. 
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I Jump Interval f 

Fig. 4. An example of jump intervals 

Table 1. Statistics for the jump intervals 



Number of 
Jump 
Intervals 


100 vertices 


200 vertices 


400 vertices 


3% 


5% 


10% 


3% 


5% 


10% 


3% 


5% 


10% 


0~5 


48 


34 


20 


38 


28 


4 


48 


40 


27 


6~10 


1 


4 


17 


12 


17 


11 


1 


5 


13 


11-15 


0 


0 


9 


0 


4 


23 


0 


3 


6 


16-20 


1 


2 


1 


0 
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6 Concluding Remark 

In this paper we propose a clustering algorithm for interval graph test on noisy 
data. The physical mapping problem in human genome research can be modeled 
as an interval graph recognition problem, if the overlap information is error-free. 
However, data collected from laboratories unavoidably contain errors. Tradi- 
tional recognition algorithms can hardly be applied directly on noisy data, and 
related models for the imperfection are shown to be NP-hard. In our algorithm, 
for two typical error types FPs and FNs, we check the neighborhood data to see 
whether they conform “approximately” to a particular local structure dictated 
by interval graphs to determine whether overlapping information are valid or 
noisy. The experimental results show that, when the error percentage is small, 
the clustering algorithm is robust enough to discover certain errors and to correct 
them automatically most of the time. 
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Abstract. Data generation for computational testing of optimization 
algorithms is a key topic in experimental algorithmics. Recently, concern 
has arisen that many published computational experiments are inade- 
quate respect to the way test instances are generated. In this paper we 
suggest a new research direction that might be useful to cope with the 
possible limitations of data generation. The basic idea is to select a fi- 
nite set of instances which ‘represent’ the whole set of instances. We 
propose a measure of the representativeness of an instance, which we 
call e-representativeness-, for a minimization problem, an instance Xe is 
e-representative of another instance a; if a (1 -|- e)-approximate solution 
to X can be obtained by solving Xg. Focusing on a strongly NP-hard single 
machine scheduling problem, we show how to map the infinite set of all 
instances into a finite set of e-representative core instances. We propose 
to use this finite set of e-representative core instances to test heuristics. 



1 Introduction 

Motivation. The literature addressing computational testing of optimization al- 
gorithms has a long history and, until recently, computational testing has been 
performed primarily through empirical studies. In the operations research com- 
munity, concern has arisen that many published computational experiments are 
inadequate, especially respect to the way test instances are generated (see [7], [9], 
[10], [1], [15] and [11]). Among the limitations of computational tests is that they 
often involve simply devising a set of supposedly ‘typical’ instances, running the 
algorithm and its competitors on them, and comparing the results. The use- 
fulness and importance of such experiments strongly depends on how ‘typical’ 
the sample instances actually are. Moreover, the number of such comparisons is 
quite limited. As a consequence, it is probably unwise to make any hard and fast 
generalizations by using this kind of computational tests. 

* Supported by Swiss National Science Foundation project 20-63733.00/1, “Resource 
Allocation and Scheduling in Flexible Manufacturing Systems” , and by the “Meta- 
heuristics Network”, grant HPRN-CT-1999-00106. 
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Our Contribution. In this paper we suggest a new research direction that might 
be useful to cope with the possible limitations of data generation. The basic 
idea is to select a finite set of typical instances which ‘represent’ the whole set 
of instances. The key phrase here is ‘represent’. We concentrate on the concept 
of representativeness of a set of problem instances. We define a measure of the 
representativeness of an instance that we call e-representativeness. Given a min- 
imization problem'^ V, an instance is £-representative of another instance x 
if a (Id- ^)-approximate solution for can be transformed in polynomial time 
into a (1 -|- (5 -|- £)-approximate solution for x, for £,<5 > 0. In other words, if 
a ‘good’ solution for Xg can be mapped into a ‘good’ solution for x. 

In this paper we restrict to a strongly NP-hard single machine scheduling 
problem, which in the notation of Graham et al. [6], is noted l|xj, gj jCmax- For 
this problem we propose a methodology for experimental testing of algorithms 
which is based on the concept of £-representativeness. The test methodology 
works as follows. For any £ > 0, we map the infinite set of all instances I into 
a finite set of instances /g which are £-representative of the instances in I. We 
call Ig core set, and the instances belonging to it core instances. More precisely, 
we exhibit a linear time algorithm / that transforms any given instance x from I 
into an instance Xg that is £-representative of instance x, and that belongs to 
a defined finite set /g. Moreover, a (1 -I- 5)-approximate solution for Xg can be 
transformed in linear time (by algorithm g) into a (l-|-(5-|-£)-approximate solution 
for X, for any ^ > 0. Assume now that you have a heuristic that has been 
tested on the finite set of core instances /g. If iJg is an (experimentally) good 
heuristic iJg (i.e. an algorithm that returns (1 -I- (J)-approximate solutions) for 
this finite set Ig of core instances, then it can be used to get a good algorithm 
for the infinite instances of problem l|xj, jCmax (he. (1 -I- -I- £)-approximate 

algorithm). In Figure I we give a pictorial view of the (1 -I- <5 -I- £)-approximation 
algorithm that works as follows: for any given instance x from I, first transform x 
into Xg, then apply Hg on Xg, and transform back the obtained solution for Xg into 
(1 -I- (5 -I- £)-approximate solution for x. It follows that by testing the heuristic on 
the core instances, we can obtain an approximation algorithm whose performance 
ratio depends on the experimental results and on the representativeness £ of the 
instances used for testing. We remark that this approach can be seen as a ‘black- 
box’ testing and used to test any kind of heuristics (such as simulated annealing, 
genetic algorithms, tabu search, ant colony optimization algorithms, etc.). 

Unfortunately, we could only find a core instance set whose cardinality is 
exponential in 1 je. A natural question addressed in this paper is to understand if 
we can hope to reduce considerably the size of the core set, i.e., to be polynomial 
in l/£. For strongly NP-hard problems, we exhibit a first negative results which 
states that the size of /g cannot be a polynomial in l/£, unless P=NP. (We 
do not know if weakly NP-hard problems admit a polynomial in l/£ core set 
size.) This means that for ‘small’ values of £ the number of core instances may 
actually be too large to be fully tested. This limitation suggests a second natural 
question, that is, what can we say about the performances of an algorithm by 

^ A similar definition can be given for maximization problems. 
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C 



H 



£ 



( 1 +d+K)-approx. 
solution for x 



(l+8)-approx. 
solution for ,Vg 



Fig. 1. The resulting (1 + <5 + £r)-approximation algorithm 



performing a ‘small’ (polynomial in 1/e) number of tests? A first positive answer 
to this question can be obtained by using Monte Carlo method for sampling on 
the core set. We show that within a ‘small’ number of tests we can check if an 
algorithm performs ‘well’ (or better than another algorithm) for most of the core 
instances with high probability. 

We believe that this paper presents a new research direction and shows only 
preliminary results. Many questions remain to be explored to understand if it 
can be applied in a direct way for experimental analysis of algorithms/heuristics. 

Techniques. Interestingly, the techniques used to reduce / to A are a combination 
of the ‘standard’ techniques to obtain polynomial time approximation schemes 
(PTAS) (see [17] for a survey). In the literature there are several polynomial 
time approximation schemes that make use of the relative error e to reduce the 
number of potentially ‘interesting’ solutions from exponential to polynomial. We 
exploit the error e to reduce the number of ‘interesting’ instances. These strong 
similarities let us hope that analogous results can be obtained for several other 
problems that admit a PTAS. Indeed, we claim that, by using the ideas intro- 
duced in this paper and the techniques described in [-5], the presented approach 
for testing can also be used for other scheduling problems, such as the identical 
and unrelated parallel machine scheduling problem with costs, the flow-shop and 
the job-shop scheduling problems with fixed number of machines and of opera- 
tions per job. More generally, we believe that this approach can be extended to 
several other problems belonging to the class EPTAS^. 

Organization of the Paper. The remainder of the paper is organized as follows. 
Section 2 focus on the case study of the single machine scheduling problem, 
for which, in Section 2.1 we derive the core instance set. Section 3 shows the 

^ EPTAS is the class of optimization problems which admit approximation schemes 
running in time /(l/ejn'”, where / is an arbitrary function, n is the input size and c 
is a constant independent of e (see [3]). 
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theoretical lower bound on the core set cardinality. Section 4 shows how to sample 
and how to test a subset of core instances by means of Monte Carlo sampling. 
Experimental results of the proposed testing methodology will be given in the 
full version of this paper. Conclusions and future research issues are in Section 5. 



2 Case Study: Scheduling on a Single Machine 

We shall study the following problem. There is a set of n jobs Jj (j = 1, ...,n). 
Each job Jj must be processed without interruption for pj > 0 time units on 
a single machine, which can process at most one job at a time. Each job has 
a release date rj > 0, which is the time when it first becomes available for 
processing. After completing its processing on the machine, a job requires an 
additional delivery time qj >0. If sj (> rj) denotes the time Jj starts processing, 
job Jj is delivered at time Sj + Pj + Qj. Delivery is a non-bottleneck activity, in 
that all jobs may be simultaneously delivered. Our objective is to minimize, over 
all possible schedules, the maximum delivery time, i.e., maxj Sj -\- Pj + qj. The 
problem as stated is strongly NP-hard [12], and in the notation of Graham et 
al. [6] it is noted as Ijr-j, gj jCmax- 

The first phase of our test method requires mapping the set of all instances 
to the set of core instances. This is performed in Subsection 2.1. 

2.1 Core Reduction 

We start defining set 1^ of core instances, then we prove that the infinite set I 
of all instances can be mapped to the finite set 1^, for every e > 0. In the rest 
of the paper we assume, for simplicity of notation, that 1/e is an integer. Let 
V = {l-\- y)^+f = 0(l/e^). The set of integers 0, 1, ..., k is denoted [k]. 

Definition 1. An instance Xg belongs to if and only if the following holds: 

1. Xe has V jobs; 

2. for every job j of x^, rj = h-e and qj = i ■ e for some h,i G [1/ej; 

3. for every job j of x^, Pj = if; for some i G \v/e\; 

The size of 1^ is bounded by (l/e)‘^(^/®^). Indeed, let us say that two jobs 
are of the same type if they have the same processing, release and delivery time. 
Clearly, the number of different types is bounded by r = 0(l/e^). Let us denote 
these T types by t\,...,tr. Let Ui denote the total number of jobs of type U, 
for i = 1,...,T. A core instance is any vector (ui, U 2 , ..., with = v. 

Therefore, we have at most = (1/e)'^^^/® ^ core instances. 

The following theorem proves that the infinite set I of all instances can 
be mapped to A. Our approach uses several transformations of the given input 
instance x which may potentially increase the objective function value by a factor 
of l+0(e). Therefore we can perform a constant number of transformations while 
still staying within 1 + 0(e) of the original optimum. At the end, the resulting 
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transformed instance belongs to 1^ and any (l+^)-approximate solution for 
can be transformed in linear time into a (l + ^+0(£))-approximate solution for x. 
Throughout this paper, when we describe this type of transformation, we shall 
say it produces 1 + 0(e) loss. (The proof of the following theorem can be found 
in appendix.) 

Theorem 1. For any fixed e > 0, with 1 + 6e loss, any given instanee x £ I 
ean he transformed into an instance Xg € in 0{n) time, where the constant 
hidden in the 0{n) running time is reasonably small and does not depend on the 
error e. 

Note that, although the size of Ig may be decreased, we prove in Section 3 
that |/e| cannot be polynomial in 1/e, unless P=NP. 



3 On the Core Set Size 

In 1977, Berman and Hartmanis [2] investigated the density of NP-complete 
problems. The density of a language A is a function Ca '■ N — > N defined 
by CA{n) = \{x £ A : \x\ < n}\. Recall that a language A is sparse if there 
exists a polynomial q such that for every n € N, CA(n) < q{n). In [2], Berman 
and Hartmanis proposed the density conjecture, that no sparse language can be 
NP-complete. The density conjecture was proved to be equivalent to PyfNP by 
Mahaney [13]. 

In this section we give a general condition that assures that a problem V 
cannot have a core set size which is polynomial also in 1 /e. Theorem 2 applies 
to the problem addressed in this paper. The proof of the following theorem uses 
the result proved by Mahaney [13]. 

For any problem V and for any instance x of V , let max(a:) denote the value 
of the largest number incurring vci x £ I, where I denotes the set of instances of 
problem V . For simplicity, let us focus on minimization problems. Let SOL(x) 
be a function that associates to any input instance x £ I the set of feasible 
solutions of X. Let m be a function that associates with any instance x £ I and 
with any feasible solution y £ SOL{x) a positive rational m(x, y) that denotes 
the measure of solution y. The value of any optimal solution of x will be denoted 
as m*{x). The following definition formalize the notion of core reducihility, that 
maps instances of / to instances of but it also maps back good solutions for 
to good solutions for I. 

Definition 2. For any fixed £ > 0, problem V is core reducible, if two func- 
tions f and g, and a finite set of instances exist such that, for any instance 
X £ I: 

1. f{x,e) £ fi; 

2. For any y £ SOL{f{x,e)), g(x,y,e) £ SOL{x); 

3. f and g are computable in polynomial time with respect to both the length of 
the instance and 1/e; 
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4-. There exists a polynomial time computable solution s € SOL{f(x,e)) such 
that m{x, g{x, s, £:)) < (1 + e) • m*{x). 

We are now ready to give the following fundamental result. (The proof of the 
following theorem can be found in appendix.) 

Theorem 2. Let V he a strongly NP-hard problem that admits a polynomial p 
such that m*{x) < p(|a;|, max(x)), for every input xGl.IfVis core reducible, 
then the density of Ig cannot be polynomial in 1/e, for any e > 0, unless P=NP. 

The two main requirements of Theorem 2 are that V have solution values 
that are not too large and that V is NP-complete in the strong sense. These 
hold for many problems of interest, including, for example, the bin packing 
problem, the graph coloring problem, the maximum independent set problem 
and the minimum vertex cover problem. Theorem 2 also applies to the addressed 
scheduling problem. Thus we have a general method of considerable power for 
ruling out the possibility of a core set size polynomial in 1/e. 

4 Sampling on the Core Set 

In Subsection 2.1 we have presented a core set for problem l|rj, |Cmax whose 
cardinality is exponential in 1/e. Moreover, in Section 3 we have shown that it is 
unlikely that a ‘small’ core set exists. Thus, for ‘small’ values of the e parameter, 
the core set may be too big to be fully tested. A way to deal with this problem is 
to select a subset of the core set which is small enough to be fully tested. In this 
case the problem is to select an appropriate subset for which any measurement 
done on the subset is a good estimator of a measurement on the whole core set. 
In this section we use Monte Carlo sampling to select by random sampling an 
appropriate subset of the core set, in order to do algorithm testing. 

4.1 Testing by Monte Carlo Sampling 

Let 7^ be a core reducible problem and Jg be the core set of this problem for 
every fixed e > 0. Let A be an algorithm for V which we want to evaluate by 
experimental testing. Let us define a test to make on A, that is, let us define a 
Boolean function f \ ^ {0, 1} over 1^. We define the set if = {x € Ie\f{x) = 

1} as the pre- image of 1. The function / may be thought of as an indicator of 
success (/(x) = 1) or failure (/(x) = 0) of the algorithm A over the core set Ig. 
For example, given a core instance x G 1^, the algorithm A has success when it 
finds a solution which is within a certain distance from the optimum, while A 
fails in the opposite case. The problem is to estimate the size of if respect to the 
size of /e, that is, the fraction |//|/|/e| of ‘successes’ in our test of the algorithm. 

An obvious approach to estimating |J/|/|/e| is to use the classical Monte 
Carlo method (see, e.g. [16]). This involves choosing N independent samples 
from A, say xi, ....,xtv, and using the value of / on these samples to estimate 
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the probability that a random choice will lie in More formally, define the 
random variables ATi, ...,Xyr as follows: 



X, = 



1 if f{xi) = 1 
0 otherwise. 



By this definition, Xi 
random variable 



1 if and only if Xi G I^. Finally, we define the estimator 



N 

Z=\Ie\Yl 



i=l 



N ■ 



It is easy to verify that E[Z] = |/g|. We might hope that with high probability 
the value of Z is a good approximation to |/,^|. In particular, we want to have 
a small error with respect to j/el, when using Z as an estimator for |/g|. Of 
course, the probability that the approximation is good depends upon the choice 
of N. The following theorem (see also [16]) relates the value of iV to 77 and <5. 



Theorem 3. For rj € (0,1] and S G (0,1], the Monte Carlo method computes 
an estimator Z such that 



Pr 





> 1 -^, 



provided 




In practice, after running A on the sample, we count the number of times 
A has been successful. Let X denote this value {X is linked to the estimator Z 
by Z = [de]^). Theorem 3 implies that, provided the sample is big enough, ]/,)] 
lies in the interval ((^ — ? 7 )l/ej,(^ +77)j/ej), with probability at least 1 — <5. 
Therefore, with probability at least 1 — <5, algorithm A is successful for at least 
the (^ — 77 ) • 100% of core instances. 



4.2 Comparison of Two Algorithms 

The goal of this section is to describe a method to compare algorithms. Let V 
be a core reducible problem and C be the core set of this problem for every fixed 
e > 0. Let A and B be two algorithms for V which we want to compare. Let 
us define a test to make on A and B, that is, let us state when an algorithm is 
‘better’, ‘worse’ or ‘equivalent’ to another algorithm. For example, given a core 
instance x G C, the algorithm A is better than B on x if it returns a better 
solution than B. 

In the following we are interested into computing how many times A is better, 
worse or equivalent to B. For every given algorithm A and B the set Jg can be 
seen as partitioned into three subsets /^, and of instances, where is 

the set of instances for which A is better than B, the opposite and is 



216 



Monaldo Mastrolilli and Leonora Bianchi 



the set of instances for which A and B are equivalent. We are interested in the 
size of these subsets and in the following we describe how to approximate these 
values. 

Let T) and <5 be two parameters, and let N be the size of a sample from the 
core set. After running A and B on the same sample, we count the number of 
times A is better, worse or equal to B. Let a, (3 and 7 denote, respectively, these 
values. Then, we define = |/e|;^, Zf = \Ie\% and Zf^^ = \Ie\^ as the 
estimators of |/^|, |/^| and respectively. Again it is easy to verify that 

E[Z^] = \lf\, E[Zf] = I If I and E[Z^=^] = By theorem 3, we can 

say that, provided the sample is big enough (i.e. N > ^ln|), and 

\I^=^\ lie in intervals - rj)m, + ?7)|4|), - t7 )|4|, + 7)|4|) and 

(( w ~ + T])\ I s\), respectively, with probability at least 1 — <5. 

4.3 Sampling from the Core Set of 1 1 r j , qj | Cmax 

The methods described in Subsection 4.1 and 4.2 can be applied to problem 
l|rj,gj|Cmax if we can choose, uniformly at random, N independent samples 
from the core set 1^ of Definition 1. This can be done without generating the 
(exponentially large) core set 1^. Indeed, we can choose, uniformly at random, 
an instance from A as follows. Recall that r = 0(l/e^) is the number of different 
job types, and that v is the number of jobs of any instance from A (see Subsec- 
tion 2.1). Choose independently and uniformly at random v numbers from [r]. 
Each number defines the type of the ly jobs of the instance. It is easy to check 
that any instance from has the same probability to be selected. The number 
of random bits used is polynomial in 1/e. 

We observe that every polynomial algorithm for problem 1 1 Cmax can be 
tested in time polynomial in I/77, log(l/<5) and 1/e. Indeed, the required number 
of experiments is a polynomial function of I/77 and log(l/<5) , and the size of every 
core instance is polynomial in 1/e. 

5 Future Work and Research 

Focusing on a strongly NP-hard single machine scheduling problem, we have 
shown how to map the infinite set of all instances into a finite set of e-representa- 
tive core instances. The e-representative core instances can be used to test heuris- 
tics. In the full version of this paper we will provide experimental applications of 
the described testing methodology. We remark that this approach can be seen as 
a ‘black-box’ testing and used to test any kind of heuristics (such as simulated 
annealing, genetic algorithms, tabu search, ant colony optimization algorithms, 
etc.). 

We believe that this paper is just an initial step in the exploration of such 
idea for testing and so we state some open problems and possible directions. 

We have seen that the number of core instances cannot be polynomial in 1/e 
when the problem is strongly NP-hard. Does the latter hold also for a weakly 
NP-hard problem? 
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By sampling uniformly and testing heuristics on the sampled instances, we 
can understand if they perform ‘well’ for most of the core instances. Other ways 
of sampling should be investigated to understand on which instances heuris- 
tics perform poorly. With this aim, the statistical technique called importance 
sampling (see [16] p. 312) might be fruitful. 

Finally, we believe that there are possibilities to connect this work to param- 
eterized complexity [4]. 
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A Appendix 

A.l Proof of Theorem 1 

We start describing procedure Core-Reduction which maps the infinite set I of 
all instances to a finite set Jg of core instances. Then, we analyze this algorithm 
and show that by applying Core-Reduction, any given instance x from I can be 
transformed with (1 -|- fie) loss into a core instance belonging to /g, for every 
e > 0. 

The procedure Core-Reduction consists of scaling the input data and ‘merg- 
ing’ jobs to form an approximate version of the original instance that belongs to 
a finite set of different instances. The procedure Core-Reduction is as follows. 

1. Checking number of jobs. Let iz = (l+y)^+f = 0{l/e^). If the input instance 
has no more than iz jobs, go to point (3), otherwise go to point (2); 

2. Grouping small jobs together. Let P = ?'max = maxj and gmax = 

maxj Qj . Round down every release date (delivery time) to the nearest value 
among the following p = 1 + ^ = Oil/e) values: 0, er^ax, 2£rmax, Cmax 
(0, egmax, 2egniax, gmax)- Foi' simplicity of notation, let us use rj and qj to 
denote the resulting rounded release and delivery times of any job j. Divide 
jobs into large and small jobs, according to the length of their processing 
times, L = {j ■. pj > eP} and S = {j : pj < eP}. Let us say that two small 
jobs, j and h, belong to the same class iff rj = rh and qj = qt- The number 
of distinct job classes is bounded by k = (H- y)^ = 0(l/e^). Let us partition 
the set of small jobs into at most k classes Ki,K 2 , ..., Let j and h be 
two small jobs from Ki such that pj +ph < sP- We ‘glue’ together these two 
jobs to form a composed job with the same release date and delivery time 
as j (and h), and in which the processing time is equal to the sum of their 
processing times. We repeat this process until at most one job from Ki has 
processing time not greater than £Pf2. At the end, all jobs in class Ki, have 
processing times not greater than eP. The same procedure is performed for 
every class. At the end of the process, the total number of composed jobs is 
no larger than + k = iz. 

3. Normalizing. Let Pmax, ''’max and gmax denote the maximum processing, re- 
lease and delivery time, respectively, of the instance after steps (1) and (2). 
Let LB = maxjpmaxj gmax 7 '’max}- Normalize the instance by dividing every 
release, delivery and processing time by LB. 

4. Rounding release and delivery times. Round down every release date and 
delivery time to the nearest value among the following p = 0(1 /e) values: 
0,e,2e, ..., 1. 
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5. Rounding processing times. Round down every processing time to the nearest 

value among the following tt = 0(l/£r^) values: 0, e/z/, 2e/i/, 1. 

In the following we analyze Core-Reduction and show that the transformation 
is with (1-1- 6e) loss. First we observe that if the number of jobs of the input 
instance is not less than v then, by following the ideas in [14], small jobs can 
be merged together as described in step (2) with 1 -|- 3e loss (see Lemma 1 for 
a proof of this). 

Note that step (3) can be performed with no loss, the normalized instance is 
equivalent to the one after steps (1) and (2). Observe that after step (3) every 
processing, release and delivery time is not greater than 1. Now, let us consider 
the resulting instance x after step (1), (2) and (3), and instance x' obtained 
from X by rounding release, processing, and delivery times as in steps (4) and 
(5) of Core-Reduction. In steps (4) and (5), by rounding down the values we 
cannot obtain a modified instance with an optimal value larger than the the 
optimal value of instance x. Following the ideas in [8], every feasible solution 
for the modified instance x' can be transformed into a feasible solution for the 
original instance just by adding e to each job’s starting time, and re-introducing 
the original delivery and processing times. It is easy to see that the solution 
value may increase by at most 3e (observe that there are at most u jobs and 
each processing time may increase by at most e/v). 

Let us now focus on the time complexity of Core-Reduction. It is easy to check 
that steps (1),(3),(4) and (5) of the procedure requires linear time. Suppose 
now that step (2) must be performed, that is, n> v. The process of partitioning 
small jobs into k job classes, as described in step (2) of the procedure, can be 
implemented to take 0{n -\- k) time. Since in this case n > k, step (2) also takes 
0{n) time. 

Lemma 1. With 1 -I- 3e loss, the number of jobs can be reduced to be at most 
minjn, 0(l/e^)}. 

Proof. In the following we show that by using the instance with grouped small 
jobs (as described in step (2) of Core-Reduction) we can get an optimal solution 
with 1 -I- 3e loss. 

Following the ideas in [8], every feasible solution for the modified instance 
with rounded release and delivery times can be transformed into a feasible so- 
lution for the original instance just by adding ermax to each job’s starting time, 
and re-introducing the original delivery times. It is easy to see that the solution 
value may increase by at most e(rniax + 9max) < 2eopt, where opt is the optimal 
value of the original instance (see [8]). Let us consider an optimal schedule y* 
for the instance x with rounded release and delivery times. Modify x to obtain 
instance x* in which the processing times of all jobs remain unchanged, while 
release and delivery times are set as follows 

rj = rj, Qj = q, for j e S, 

= ■sj. Qj = opt(x) - pj - s* for j G L, 

where s* denote the starting time of job j in y*. Clearly opt{x) = opt{x*). 



( 1 ) 
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Merge small jobs as described before and denote by x the resulting modified 
instance. By showing that there exists a schedule y for x such that m{x,y) < 
(1 + e)opt{x*) the claim follows. 

Consider the following algorithm, known as Extended Jackson’s Rule: when- 
ever the machine is free and one of more jobs is available for processing, sched- 
ule an available job with largest delivery time. Apply the Extended Jackson’s 
Rule to the modified instance x. Let use y to denote the resulting schedule. 
Let us define a critical job Jc as one that finishes last in y, i.e., its delivery 
is completed last. Associated with a critical job Jc there is a critical sequence 
consisting of those jobs tracing backward from Jc to the first idle time in the 
schedule. Let us fix a critical job Jc and denote the last job A, in the criti- 
cal sequence with qc > Qb as interference job for the critical sequence. Let B 
denote the set of jobs processed after Jf, in the critical sequence. By the way 
that Jb was chosen, clearly qj > qc for all jobs Jj G B. We claim that if there 
is no interference job then m{x,y) = opt{x*), otherwise m{x,y) < opt{x*) + pb- 
It is easy to show that m{x,y) < pb + miiij-gs fj -|- + miiij-gs qj and 

opt{x*) > miuj-gs fj + Yhj^BPj + miuj-gB By the way that the interference 
job Jb was chosen, we have qc = nainj^BQj- Let U denote the set B of jobs 
obtained by “unsticking” small jobs. We can bound the length of an optimal 
schedule for x* as opt{x*) > Vj + Yhj^uPi min^gj/ qj. It is easy to see 

that min^gB Cj + T^jaBPj + minjes Jj = minjGC/ fj + T^jauPi + Qj- In- 

deed, we have glued only small jobs having the same release and delivery times, 
and therefore miiij^BQj = minjgf/^j and miiij^Bfj = minjgf/rj. Therefore, 
nn{x, y) < Pb + opt{x*) and if there is no interference job then m(x, y) = opt{x*). 

Then if there is no interference job or Jb is a small job we have m{x,y) < 
(1 -I- e) ■ opt{x*), by definition of small jobs (“glued” or not). Now, consider the 
last case in which the interference job Jb is a large job. By the definition of the 
extended Jackson’s rule, no job j G B could have been available at the time 
when Jb is processed, since otherwise such a job j would have taken priority 
over job Jb- Thus, fc > fb and (jc > (jb, but Jb cannot be a large job since by 
construction if fc > fb then (jb > (jc- 

A. 2 Proof of Theorem 2 

Let us assume that there exists a polynomial qi such that the density Cj^ of 
is bounded by (n) < qi{n, 1 /e), for every n gN and for any e > 0. Since V is 
strongly NP-hard, there exists a polynomial q 2 such that problem V obtained by 
restricting V to only those instances x for which max(a;) < q 2 {\x\), is NP-hard. 
We show that the assumption is equivalent to P=NP by reducing problem V to 
a problem V' whose underlying language is sparse and NP-complete. 

For every {x,y) G I x SOL, the measure function m{x,y) of V is defined 
to have values in Q. It is however possible to transform any such optimization 
problem into an equivalent one such that m{x, y) is a positive integer. We assume 
this in the following. 

Consider set I{n) = {x S / : |cc| = n}, for any n S N. For every x G I(n), if we 
set the error to e{n) = l/(p(n, q 2 {n)) + l), then any y* G SOL*{f{x, s{n)) can be 
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transformed into the optimal solution for x by means of function g. Indeed, since 
7^ is core reducible we have m(a:, (;(x, j/*, e(n))) < {l + e{n))-m*{x) < m*{x) + l, 
where the last inequality is due to the fact that m*{x) < p(|a;|, max(a:)) < 
p(n, q 2 {n)). From the integrality constraint on the measure function m, it follows 
that m{x,g{x,y* ,£{n))) = m*{x), that is, g{x,y* ,e{n)) is an optimal solution 
for X. The time complexity to compute f{x, e(n)) and g{x, y* ,e{n)) is polynomial 
in |a;|. Let us partition set I{n) into equivalence classes, such that x,z £ I{n) 
belong to the same class iff /(x,£(n)) = f{z,e{n)). Consider set I'{n) which 
contains for every equivalence class the corresponding core instance, i.e., I'{n) = 
{/(x,e(n)) : x G I{n)}. Let be the polynomial limiting the computation 
time of f{x,e{n)), for x G I{n). Then, for any x G I{n), \f{x,e{n))\ < qsin). 
If |/(x,£(n))| < n we increase the length of f{x,e{n)) up to n, such that the 
modified instance is equivalent, except for the length, to the former (for instance, 
we can do that by adding a dummy symbol several times till length n is touched) . 
For simplicity, let us denote this modified length-increased set by I'{n) again. 
It is easy to see that I{n) polynomially reduces to /'(n), since a polynomial 
time algorithm for I'{n) implies the existence of a polynomial time algorithm 
for I{n). According to the assumption, the cardinality of I'{n) is bounded by 
a polynomial qi{n), where qA{n) = qi{n,p{n, q 2 {n)) + 1). Since for every n G N, 
I{n) polynomially reduces to then I = Un^nKji) polynomially reduces 

to /' = Un^nl'in), and the density Cj' of /' is bounded by a polynomial, indeed 
Ci'{n) = |{x € r : \x\ < n}| < | Ui<„ = Y.i<n Hence, problem V 

can be reduced in polynomial time to a problem V' with a sparse set of instances. 
It is easy to see that V is NP-hard, and the underlying language is sparse and 
NP-complete. This is equivalent to P=NP [13]. 



A. 3 Proof of Theorem 3 



The claim follows by proving that 

Pr [Z<\I^\-g\I,\] 

and 



Pr [Z>|/i| + ,y|4|] 



for A > 4 hr I . 

— 6 

Define A = Xi, p = \l} 
Vv[Z <\ll\-g\h\] =Pr 



Ie\ and the estimator Z = \Ig\X/N. Then, 
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= Pr 



A < Ap 1 - d 
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(3) 



Since E[A] = N p, by a straightforward application of the Chernoff bound (see, 
e.g. [16]) to the rightmost term we obtain 



Pr [Z < \I^\ - 77141] < exp"^^ < exp"^T 
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For the given lower bound on N we obtain (2). Similarly, we can prove (3). 
It is easy to see that inequalities (2) and (3) imply 



Pr 





< V 



>l-S . 
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Abstract. Algorithms are the heart of computer science. They make 
systems work. The theory of algorithms, i.e., their design and their anal- 
ysis, is a highly developed part of theoretical computer science [7]. 

In comparison, algorithmic software is in its infancy. For many funda- 
mental algorithmic tasks no reliable implementations are available due to 
a lack of understanding of the principles underlying reliable algorithmic 
software, some examples are given during the talk. The challenge is 

— to work out the principles underlying reliable algorithmic software 
and 

— to create a comprehensive collection of reliable algorithmic software 
components. 

I describe what I consider a major challenge in algorithmics, and then 
outline some venues of attack. 
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Abstract. In this paper we introduce a new class of greedy heuristics 
for general job shop scheduling problems. In particular we deal with 
the classical job shop, i.e. with unlimited capacity buffer, and job shop 
problems with blocking and no- wait. The proposed algorithm family is 
a simple randomized greedy family based on a general formulation of 
the job shop problem. We report on an extensive study of the proposed 
algorithms, and comparisons with other greedy algorithms are presented. 

1 Introduction 

A scheduling problem arises when a set of competing jobs requires some process- 
ings on a set of machines. From the modeling point of view most works on job 
shop scheduling problems are based on the disjunctive graph formulation of Roy 
and Sussman [23]. In this paper we deal with the alternative graph [12] model, 
a generalization of the disjunctive graph capable of modeling several constraints 
arising in real-world applications. In particular we deal with the classical job 
shop scheduling problem, i.e. with unlimited buffer, and with job shop schedul- 
ing problems with blocking and no- wait in process. As solution technique we 
introduce a new class of randomized greedy heuristics based on the alternative 
graph formulation. 

The paper is organized as follows. In Section 2 a brief survey on greedy 
algorithms is reported, in Section 3 we introduce the notation and the alternative 
graph formulation. In Section 4 we discuss the arc greedy heuristic algorithms 
and we report on our computational results. Some conclusions follow. 

2 Literature Review 

In this section we review literature on greedy heuristics, these algorithms are 
constructive^ and aim to build a solution by repeatedly enlarging a consistent 
partial solution. 

Over the years, a great number of Priority Dispatching Rules (PDR) has been 
proposed (see, for example [5], [7], [10], [20]). These very fast one-pass algorithms 
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select an operation from a subset according to some simple heuristic criterion, 
and schedule it at the end of the partial sequence. These procedures can be either 
deterministic or randomized, usually in the randomized heuristics the best over 
several run is considered. It is therefore common to consider the best dispatching 
solution over a set of different rules, as, for example, in [11], [19]. However, as 
reported by several authors [11], [21], the behavior of these dispatching rules is 
quite erratic, and no rule clearly outperforms the others. 

Other greedy techniques are based on insertion algorithms, which aim to build 
processing order of the operations on each machine. At each step an operation 
is selected and inserted into the partial solution. It is worth noting that the 
operation could be inserted in any point of the sequence, and not only at the 
end as in the PDRs. The insertion algorithm were first applied to the flow shop 
scheduling by Nawaz et al [17] and to job shop by Werner and Winkler [25]. 
Moreover both the INSA (INSertion Algorithm) of Nowicki and Smutnicki [18], 
and the Bidir algorithm of Dell’Amico and Trubian [8] are insertion algorithms 
and are used as initial solution of a tabu search algorithm. 

The Shifting Bottleneck Procedure (SB) is a tailored heuristic algorithm for 
solving the job shop scheduling problem. This algorithm was first proposed in 
1988 by Adams, Balas and Zawack [1]. The Shifting Bottleneck procedure is 
a constructive algorithm that iteratively schedules a single machine problem, 
using the Carlier [6] approach. Balas, Lenstra and Vazacopoulos [4] consider as 
one-machine relaxation for the SB procedure the one-machine problem with de- 
layed precedence constraints and solve it to optimality. At each step m single 
machine relaxed problems are solved, and the critical (bottleneck) machine is 
first identified and then scheduled. Every time a machine is scheduled, all the 
previous scheduled machines are kept fixed. As a stand alone heuristic, the Shift- 
ing Bottleneck procedure performs better than the simple Priority Dispatching 
Rules heuristics but it requires more computational times, better results are 
obtained by reiterating the SB procedure. 

The Beam Search technique is an approach to improve the performance of 
single heuristics. The beam search is a heuristic search strategy [15] that consists 
of a limited breadth first visit of the branching tree. At each step of the search 
process only the best (3 candidates are maintained, and all the other candidates 
are discarded. The parameter /3 is called beam width, and it influences the per- 
formance of the algorithm. The beam search procedure has been applied to the 
job shop scheduling by Sabuncuoglu and Bayiz [24] superimposing it to priority 
dispatching rules. 

3 The Alternative Graph Formulation 

In this section we introduce the alternative graph formulation [12], [13]. The 
job shop scheduling problem is the problem of allocating machines to competing 
jobs over time, subject to the constraint that each machine can handle at most 
one job at a time. The sequence of machines for each job is prescribed and the 
processing of a job on a machine is called an operation and cannot be interrupted. 
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An operation requires a fixed, non-preemptive processing time. Each job consists 
of a sequence of operations that have to be processed in a specified order, and it 
is known in advance. The goal is to minimize the makespan, i.e. the completion 
time of the last operation. 

We focus on the sequencing of operations rather than on the sequencing of 
jobs. We have therefore a set of operations {oq, Oi, . . . , o„} which have to be per- 
formed on m machines {mi, m 2 , . . . , m^}. Each operation Oi requires a specified 
amount of processing time Pi on a specified machine M(i), and cannot be inter- 
rupted from its starting time ti to its completion time ci = ti +Pi. oq and On are 
dummy operations, with zero processing time, that we call “start” and “finish” 
respectively. Each machine can process only one operation at a time. 

There is a set of precedence relations among operations. A precedence re- 
lation (i,j) is a constraint on the starting time of operation Oj, with respect 
to ti- More precisely, the starting times of the successor Oj must be greater or 
equal to the starting time of the predecessor Oi plus a given delay fij, which 
in our model can be positive, null or negative. Finally, we assume that oq pre- 
cedes oi, . . . , On , and o„ follows oq, . . . ,o„_i. Precedence relations are divided 
into two sets: fixed and alternative. Alternative precedence relations are parti- 
tioned into pairs. They usually represent the constraints that each machine can 
process only one operation at a time. 

A schedule is an assignment of starting times to,ti, . . . ,tn to operations oq, 
oi , . . . , o„ respectively, such that all fixed precedence relations, and exactly one 
for each pair of the alternative precedence relations, are satisfied. Without loss 
of generality we assume to = 0. The goal is to minimize the starting time of op- 
eration On- This problem can be therefore formulated as a particular disjunctive 
program, i.e. a linear program with logical conditions involving operations “and” 
(A, conjunction) and “or” (V, disjunction), as in [.3]. 

Problem 1 



min tn — to 

S.t. tj ti ^ fij (a j) ^ ^ 

{tj -ti> aij) V (tfe -th> ahk) ((aj)i (h,k)) G A 



Associating a node to each operation. Problem 1 can be usefully represented 
by the triple Q = {N, F, A) that we call alternative graph. The alternative graph 
is as follows. There is a set of nodes N, a set of directed arcs F and a set of 
pairs of directed arcs A. Arcs in the set F are fixed and fij is the length of arc 
(i,j) G F. Arcs in the set A are alternative. If {{i,j),{h,k)) G A, we say that 
(i, j) and (ti, k) are paired and that (i, j) is the alternative of (ti, k). Let aij be the 
length of the alternative arc (*,j). In our model the arc length can be positive, 
null or negative. 

A selection 5 is a set of arcs obtained from A by choosing at most one 
arc from each pair. The selection is complete if exactly one arc from each pair is 
chosen. Given a selection S let ^(5') indicate the graph {N, FUS). A selection S is 
consistent if the graph G{S) has no positive length cycles. With this notation each 
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schedule is associated with a complete consistent selection on the corresponding 
alternative graph. By definition, the makespan of a consistent selection S is the 
length of the longest path from node 0 to node n in G(S), and we call it critical 
path. The makespan of a schedule is therefore the makespan of the associated 
complete consistent selection. Moreover the critical path is defined even if the 
current selection S' is a partial and consistent selection. Given a selection S, we 
denote the value of a longest path from i to j in G{S) by l^{i,j) or simply l{i,j). 
We use the convention that if there is no path from i to j, then l{i,j) = —oo. 

The alternative graph [12], [13] is a generalization of the disjunctive graph 
of Roy and Sussman [23]. In fact, in the disjunctive graph the pairs of alterna- 
tive arcs (called disjunctive arcs) are all in the form ((*, j), (j, *)), where i and j 
are two operations to be processed on the same machine. With the alternative 
graph formulation it is possible to model in a precise way, and to solve effectively, 
a number of complex practical scheduling issues for which there were no success- 
ful methodologies so far, see for instance [14], [22]. In this paper we refer mainly 
to four job shop scheduling problems, namely the ideal (i.e. with infinite capacity 
buffer), the blocking with swap allowed (i.e. with zero buffer assumption), the 
blocking with no swap (i.e. with zero buffer and without swaps) and the no-wait 
(i.e. jobs are not allowed to idle). See the paper of Mascis and Pacciarelli [13] for 
more details on the alternative graph formulation of the considered constraints. 

4 Arc Greedy Heuristics 

In this section we introduce the Arc Greedy Heuristics (AGH), a family of greedy 
randomized heuristic algorithms. These heuristics are a randomized extension of 
the heuristics introduced in Mascis and Pacciarelli [12]. And all the heuristics 
in the AGH family have the same algorithmic structure and differ only for the 
evaluation criterion applied to select at each step the next arc in A. These 
heuristics are based on the idea of repeatedly extending a feasible selection. 
In particular, at each time, the algorithms select one unselected pair from the 
set A. The step is repeated until a feasible solution is built or a positive length 
cycle is detected. Given an alternative graph G = {N, F, A), we define Lij as the 
length of the critical path passing through the arc (*,j), and thus the following 
quantity: 



Lij — 1(0 A') F aij F l(j ^ ri( . ( 1 ) 

Glearly Lij can be considered as a myopic evaluation, and a rough lower 
bound, of the resulting makespan in the partially selected alternative graph 
obtained by selecting the alternative arc (i,j). At each step of the algorithm 
and for each unselected pair (u,v) = ((i,j),(h,k)), the values Lij and Lhk are 
computed. 

We define the heuristic criterion function HC((u, v)) as an operator that ap- 
plied to an unselected pair (rt, v) € A returns the integer value associated with 
that possible choice. Note that, in what follows, we refer to the heuristic meaning 
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the Arc Greedy Heuristic algorithm not depending on the specific heuristic crite- 
rion. Besides we refer to the heuristic criterion meaning both the function HC{-) 
and the Arc Greedy Heuristic algorithm with that specified heuristic criterion. 

The main difference between an Arc Greedy Heuristic and a priority dis- 
patching rule or an insertion algorithm is that the latter algorithms at each step 
schedule an unscheduled operation. In terms of alternative graph scheduling an 
operation Oi means to direct all the outgoing alternative arcs of the node i. 
Besides an Arc Greedy Heuristic selects at each time only an unselected pair 
in A. Therefore an Arc Greedy Heuristics requires a higher number of steps to 
build a solution, and thus a higher computation time, but it has more freedom 
of choice and can yield better results. 

4.1 Algorithmic Description 

Now we describe in more details the proposed greedy algorithm, and the different 
heuristic criteria functions implemented. In Figure 1, the general sketch of the 
AGH is shown. 

At each step the Arc Greedy Heuristic chooses the next unselected pair 
(i,j) S A according to a pseudo-random proportional rule [9]. With probabil- 
ity 0 < go < 1 a deterministic choice is made and the best arc is selected, that 
is (u,v) = arginm HC(^a,b)£Ai{o-,b)). Instead with probability 1 — go the algo- 
rithm operates a random choice with uniform probability in a cardinality based 
Restricted Candidate List (RCL). The Restricted Gandidate List contains the 
best n elements according to the heuristic criterion. The size of the Restricted 
Gandidate List is [a | A |], where 0 < a < 1 is an input parameter of the 
algorithm and | A | is the number of unselected pairs in the current alternative 
graph. Glearly the size of the RCL, i.e. the number of candidates in the RCL, 



Algorithm Arc Greedy Heuristic 
Input: 

Alternative graph Q = {N, F, A) 

Heuristic Criterion HC{-) 
int go, o 
begin 
S --0 

while A A 0 do 

with probability go choose the pair (u,v) G A minimizing HC{-) 
with probability 1 — go choose randomly a pair (u, n) € A n RCL{a) 
Select one arc belonging to (u, v) 

Update S ~ S U {u} and A := A — {(u, n)} 

if G{S) is unfeasible then STOP, failed in finding a feasible solution 

end 

end. 



Fig. 1. The general sketch of the AGH algorithm 
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is dependent on a parameter a and on the number of unselected pairs in G{S). 
Since the number of unselected pairs decreases at every step of the algorithm, 
at the beginning the RCL contains more elements than at the end of the run of 
the algorithm. 

Fixing the input parameters to go = 1 and a = 0, the AGH has a deter- 
ministic behavior, i.e. it always chooses the best imselected pair in A; whereas 
with qo = 0 and a = 1 a completely unbiased random choice is made. Note that 
if a complete random choice is made, then the heuristic criterion adopted has no 
influence on the performance of the algorithm, i.e. all the Arc Greedy Heuristic 
algorithms behave in the same way. 

Each criterion defines a function HC{{u,v)) that takes as input an alter- 
native pair belonging to the set A of alternative pairs and gives as output an 
integer value. In the following we describe in details the different heuristic criteria 
proposed. 

Once the pair (u,v) = G A is chosen, the algorithm has to 

decide how to select it, two alternatives arise (i,j) or (h,k), the choice is made 
depending on which heuristic criterion is adopted. We distinguish two different 
orderings for the values given by the heuristic criteria: increasing and decreasing. 
For increasing ordering we mean that the lower values of the heuristic criterion 
are more attractive for the algorithm and thus are selected first (we call this 
strategy “Select”), whereas in the decreasing ordering the higher values of the 
heuristic criterion are sorted in the first positions (and we refer to this strategy 
as “Avoid”). 

When selecting one arc in the chosen alternative pair (u, v) it is possible 
that the resulting selection is not consistent, i.e. the graph G{S) has a postive 
length cycle and thus the schedule is unfeasible. In this case the AGH algorithm 
tries to select the other alternative arc in the pair. If the resulting selection is 
again not consistent, then the procedure stops failing to find a feasible solution. 
If, on the contrary, the resulting selection is consistent the algorithm continues 
the run. Nevertheless it has to be pointed out that in the case of the ideal job 
shop scheduling problem all the Arc Greedy Heuristics are always able to find 
a feasible schedule. 

In Table 1 we summarize the classification of our heuristics. The name of the 
heuristic criterion results in the table, the ordering type of the criterion values 
is reported in the rows, whereas in the columns we report the operator applied 
to Lij and L^k- The first letter of the name is “A” void (“S” elect) and reflects 
the decreasing (increasing) ordering adopted. In Select heuristics the algorithm 
tries to make the best choice, whereas in the Avoid heuristics the algorithm 



Table 1. Glassification of the heuristics 



Ordering 


max{Lij,Lhk} 


, Lhk} 


Lij + Lhk 


1 Lij Lhk 1 


Decreasing 


AMCC 


ALCP 


AMSP 


AMBP 


Increasing 


SMCP 


SLCP 


SMSP 


SMBP 
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tries to avoid the worst choice. The next two letters represent the operator 
applied to and L^k- “M”ost “C”ritical means the maximum whereas “L”ess 
“C”ritical stands for the minimum, “M”ost “S”imilar is the sum, and finally 
“M”ost “B”alanced is the absolute value of the difference between Lij and L^k- 
The last letter always stands for “P”air, with the only exception of the AMCC 
criterion where the last letter stands for “C”ompletion time. We decided to use 
the name AMCC instead of using AMCP, in order to maintain the same names 
as in [12]. 

For example, the AMCC (Avoid Most Critical Completion time) criterion 
gives as output maxjTij, Lhk} and the SLCP (Select Less Critical Pair) returns 
minjLij, Lhk}- Once that the pair {{i,j),{h,k)) € A is chosen, the algorithm 
has to decide how to select it. If the heuristic criterion uses a decreasing ordering 
(Avoid type) then the arc (i,j) is selected with Lij < Lhk- In case of an increas- 
ing ordering (Select type) of the heuristic criterion the arc (i,j) is selected to 
satisfy Ly > Lhk- 



5 Experimental Results 

The performances of each algorithm are analyzed on a set of benchmark instances 
from literature. We tested the performance of each algorithm on 54 instances 
from the literature on the job shop scheduling problem. 

We consider three problems (ABZ5-7) generated by Adams, Balas & Zawack 
[1]; ten 10 x 10 problems (ORBOl-10) generated by Applegate and Cook [2] and 
40 problems of different sizes (LAOl-40) generated by Lawrence [11]. Besides 
we consider the famous instance (FTIO) proposed by Fisher and Thompson in 
a book edited by Muth and Thompson [16]. The size of the benchmark instances 
varies from 10 to 30 jobs and from 5 to 20 machines. 

For all the considered benchmark problems the number of alternative pairs 
is jAj = (n — l)/2, where m is the number of the machines and n is 

the number of the jobs. Note that the expression holds because each job has to 
perform exactly one operation on each machine. In Table 2 the size in terms of 
jobs, machines and alternative pairs of the problems is given. 



Table 2. Sizes of the problems 



tl Jobs 


tt Machines 


tt Alternative Pairs 


10 


5 


225 


15 


5 


525 


20 


5 


950 


10 


10 


450 


15 


10 


1050 


20 


10 


1900 


30 


10 


4350 


15 


15 


1575 


15 


20 


2100 



230 



Marco Pranzo et al. 



It has to be noticed that all the AGH algorithms are not able to find often 
a feasible solution to the more constrained problems, namely the blocking no 
swap, the blocking with swap allowed and no-wait. For this reason in this section 
only the results for the IJSP are presented. The issue of finding feasible solutions 
in constrained cases has to be tackled by metaheuristic algorithms, and it will 
be investigated in future research. 

For each algorithm we considered the following configurations of parame- 
ters go and a. We have defined two sets Q and A of the possible value of 
the input parameters qo & Q = {0,0.25,0.50,0.80,0.90,0.95} and a & A = 
{0.05,0.10,0.20,0.50,0.75,1}. At this set of configurations we added the deter- 
ministic configuration go = 1 and a = 0. 

In Table 3, for all the different heuristic criteria, we report on a compari- 
son between the deterministic version of the heuristic criteria and the results 
obtained by the best configuration over the 54 test instances. The average rel- 
ative error (Error(%)) is obtained as 100 * {AGH — Opt) /Opt, where AGH is 
the value of the solution found by the heuristic and Opt is the optimal value of 
the instance. The first column reports on the heuristic criterion adopted. In the 
next two columns the results of the deterministic configuration are shown. In 
columns 4 and 5 we show the best configuration and the average relative error 
over twenty runs respectively, whereas in the last two columns the results of the 
best over twenty repetitions are shown. It can be noticed that the deterministic 
criteria are more effective, at least in the best performing heuristic criteria, over 
the single run of the AGH. Moreover if we consider the best over several runs, 
then the best configurations are not the deterministic configurations anymore. 

In Figure 2 we plot the average computational times required of three differ- 
ent AMGG configurations. The times refer to a single run on a Pentium III 900 
MHz processor. We consider the deterministic configuration (go = 1 and a = 1), 
the best configuration (go = 0.80 and a = 0.20) and the random configuration 
(go = 0 and a = 1). The computational times do not depends on the heuris- 



Table 3. Gomparison between the deterministic configuration and the best 
configuration (average on a single run and best over twenty runs) 



Heuristic 


Deterministic 


Single Run 


Twenty Runs 


Criterion 


(go, a) 


Error (%) 


(go, a) 


Error (%) 


(go, a) 


Error (%) 


AMCC 


(1.00, 0.00) 


8.34 


(0.80, 0.10) 


8.91 


(0.80, 0.10) 


4.15 


AMSP 


(1.00, 0.00) 


14.24 


(0.90, 0.05) 


14.91 


(0.90, 0.05) 


7.94 


AMBP 


(1.00, 0.00) 


14.85 


(0.95, 1.00) 


15.41 


(0.95, 1.00) 


8.04 


SLCP 


(1.00, 0.00) 


21.45 


(0.50, 1.00) 


20.25 


(0.50, 1.00) 


11.49 


SMCP 


(1.00, 0.00) 


30.49 


(0.00, 1.00) 


21.10 


(0.00, 1.00) 


11.94 


SMBP 


(1.00, 0.00) 


41.24 


(0.00, 0.75) 


21.13 


(0.00, 0.75) 


11.95 


ALCP 


(1.00, 0.00) 


27.59 


(0.00, 0.20) 


21.88 


(0.00, 0.20) 


12.02 


SMSP 


(1.00, 0.00) 


27.17 


(0.00, 1.00) 


21.25 


(0.00, 1.00) 


12.50 
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Size 



Fig. 2. Computational time of a single run for AMCC heuristics 



tic criterion adopted, but rather they depend on the configuration, in fact the 
deterministic configuration is almost twice faster than the random configuration. 

In order to analyze the influence of the two parameters (go £md a) on the 
quality of the achieved solutions, we introduce the randomness of a configura- 
tion Ra,s R= 100*(a-|-l — go) - Clearly R = 0 for the deterministic configuration, 
i.e. go = 1 and a = 0, whereas the maximum value of the randomness is obtained 
by go and a = 1 which corresponds to a completely unbiased random choice. In 
Figure 3 we show the best results obtained over twenty repetitions by varying 
the randomness of the Arc Greedy Heuristic. We are aware that the choice of 
summing go and a is an arbitrary choice but there is no significant difference in 
the achieved results of two summed configurations. 

In order to analyze the influence of the two parameters (go and a) on the 
quality of the achieved solutions, we introduce the randomness of a configura- 
tion Ra,s R= 100*(a-|-l — go). Clearly R = 0 for the deterministic configuration, 
i.e. go = 1 and a = 0, whereas the maximum value of the randomness is obtained 
by go and a = 1 which corresponds to a completely unbiased random choice. In 
Figure 3 we show the best results obtained over twenty repetitions by varying 
the randomness of the Arc Greedy Heuristic. We are aware that the choice of 
summing go and a is an arbitrary choice but there is no significant difference in 
the achieved results of two summed configurations. 

The proposed heuristic criteria can be divided in two classes with two dif- 
ferent behaviors. The first class is composed of the AMCC, AMBP and AMSP 
heuristic criteria, and the second class is composed of the remaining five heuristic 
criteria. The best results for the first class are obtained for values of the param- 
eters quite close to the deterministic configuration. Whereas the best results for 
the heuristic in the second class are obtained for configuration near the complete 
random choice, and the worst results are obtained for the deterministic criterion. 

In the following we consider only the AMCC criterion because it clearly out- 
performs all the other heuristic criteria, as shown in Figure 3 and in Table 3. In 
Table 4 we report the results for the selected three configurations of the AGH al- 
gorithm, i.e. the deterministic AMCC(1.00, 0.00) and the AMCC(0.80, 0.10) (the 
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Table 4. Performance of the AMCC heuristics 





Size 


Optimal 


AMCC(1.00,0.00) 


AMCC(0.80,0.10) 


AMCC(0. 80, 0 . 10)20 


Instance 


(n, m) 


value 


Solution 


Krror {Vo) 


Solution 


Krror {Vo) 


Solution 


Frror {Vo) 


ABZ5 


(10,10) 


1234 


1318 


6.81 


1346.0 


9.08 


1287 


4.29 


ABZ6 


(10,10) 


943 


985 


4.45 


974.0 


3.29 


948 


0.53 


ABZ7 


(IS, 20) 


656 


753 


14.96 


774.5 


18.06 


744 


13.41 


FTIO 


(1040) 


930 


985 


5.91 


1019.5 


9.62 


969 


4.19 


ORBOl 


(10,10) 


1059 


1213 


14.54 


1280.0 


20.87 


1219 


15.11 


ORB02 


(10,10) 


888 


924 


4.05 


928.0 


4.50 


916 


3.15 


ORB03 


(10,10) 


1005 


1113 


10.75 


1147.5 


14.18 


1073 


6.77 


ORB04 


(10,10) 


1005 


1108 


10.25 


1112.0 


10.65 


1068 


6.27 


ORB05 


(10,10) 


887 


924 


4.17 


953.0 


7.44 


905 


2.03 


ORB06 


(10,10) 


1010 


1107 


9.60 


1123.0 


11.19 


1069 


5.84 


ORB07 


(10,10) 


397 


440 


10.83 


440.0 


10.83 


421 


6.05 


ORB08 


(10,10) 


899 


950 


5.67 


975.0 


8.45 


917 


2.00 


ORB09 


(10,10) 


934 


1015 


8.67 


1015.0 


8.67 


976 


4.50 


ORBIO 


(10,10) 


944 


1030 


9.11 


1028.0 


8.90 


958 


1.48 


LAOl 


(10,5) 


666 


666 


0.00 


666.0 


0.00 


666 


0.00 


LA02 


(10.5) 


655 


694 


5.95 


691.5 


5.57 


656 


0.15 


LA03 


(10,5) 


597 


735 


23.12 


690.5 


15.66 


617 


3.35 


LA04 


(10.5) 


590 


679 


15.08 


625.0 


5.93 


599 


1.53 


LAOS 


(10.5) 


593 


593 


0.00 


593.0 


0.00 


593 


0.00 


LAOS 


(15,5) 


926 


926 


0.00 


932.0 


0.65 


926 


0.00 


LA07 


(15.5) 


890 


984 


10.56 


911.5 


2.42 


890 


0.00 


LAOS 


(15,5) 


836 


873 


1.16 


899.0 


7.54 


863 


3.23 


LA09 


(15,5) 


951 


986 


3.68 


951.0 


0.00 


951 


0.00 


LAIO 


(15,5) 


958 


1009 


5.32 


958.0 


0.00 


958 


0.00 


LAll 


(20,5) 


1222 


1239 


1.39 


1253.5 


2.58 


1222 


0.00 


LA12 


(20,5) 


1039 


1039 


0.00 


1059.5 


1.97 


1039 


0.00 


LA13 


(20,5) 


1150 


1161 


0.96 


1166.5 


1.43 


1150 


0.00 


LA14 


(20,5) 


1292 


1305 


1.01 


1292.0 


0.00 


1292 


0.00 


LAIS 


(20,5) 


1207 


1369 


13.42 


1384.5 


14.71 


1276 


5.72 


LA16 


(10,10) 


945 


979 


3.60 


1002.5 


6.08 


973 


2.96 


LA17 


(10,10) 


784 


800 


2.04 


813.0 


3.70 


800 


2.04 


LA18 


(10,10) 


848 


916 


8.02 


905.5 


6.78 


876 


3.30 


LA19 


(10,10) 


842 


846 


0.48 


861.0 


2.26 


846 


0.48 


LA20 


(10,10) 


902 


930 


3.10 


930.0 


3.10 


913 


1.22 


LA21 


(15,10) 


1046 


1241 


18.64 


1167.5 


11.62 


1132 


8.22 


LA22 


(15,10) 


927 


1032 


11.33 


1024.0 


10.46 


985 


6.26 


LA23 


(15,10) 


1032 


1131 


9.59 


1120.5 


8.58 


1066 


3.29 


LA24 


(15,10) 


935 


999 


6.84 


1046.0 


11.87 


1014 


8.45 


LA2S 


(15,10) 


977 


1071 


9.62 


1087.5 


11.31 


1039 


6.35 


LA26 


(20,10) 


1218 


1378 


13.14 


1363.5 


11.95 


1249 


2.55 


LA27 


(20,10) 


1235 


1353 


9.55 


1441.0 


16.68 


1332 


7.85 


LA28 


(20,10) 


1216 


1322 


8.72 


1377.5 


13.28 


1308 


7.57 


LA29 


(20,10) 


1152 


1392 


20.83 


1403.0 


21.78 


1315 


14.14 


LA30 


(20,10) 


1355 


1476 


8.93 


1473.0 


8.71 


1392 


2.73 


LA31 


(30,10) 


1784 


1871 


4.88 


1939.5 


8.72 


1858 


4.15 


LA32 


(30,10) 


1850 


1942 


4.97 


2006.0 


8.43 


1957 


5.78 


LA33 


(30,10) 


1719 


1897 


10.35 


1862.5 


8.35 


1785 


3.84 


LA34 


(30,10) 


1721 


1934 


12.38 


1922.0 


11.68 


1783 


3.60 


LA3S 


(30,10) 


1888 


2017 


6.83 


2020.5 


7.02 


1924 


1.91 


LA36 


(15,15) 


1268 


1347 


6.23 


1381.5 


8.95 


1329 


4.81 


LA37 


(15,15) 


1397 


1547 


10.74 


1535.5 


9.91 


1462 


4.65 


LA38 


(15,15) 


1196 


1342 


12.21 


1395.5 


16.68 


1324 


10.70 


LA39 


(15,15) 


1233 


1361 


10.38 


1417.0 


14.92 


1366 


10.79 


LA40 


(15,15) 


1222 


1340 


9.66 


1343.0 


9.90 


1297 


6.14 


Median error 
Average error 
Maximum error 






8.34% 

7.86% 

23.12% 




8.69% 

8.46% 

21.78% 




3.32% 

4.13% 

15.11% 
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Fig. 3. The influence of the Randomness on the performances of the AGH 
algorithms 



median over twenty runs), and the AMCC(0. 80, 0. 10)20 (th® best over twenty 
runs). In columns 1, 2 and 3 we report the name of the instances, the size of 
the instance (n,m) and the values of the optimal solutions, respectively. In the 
subsequent columns we report, for each instance, the solutions found by the 
heuristics and the relative distance from the optimum (percentage). 

In Table 5 we compare the results of our best proposed algorithm which is 
AMCC(0. 8, 0 . 1)20 with other constructive heuristics taken from the literature. 
The comparisons are made only on a subset of 13 hard instances as proposed in 
Vaessens et al. [26]. This subset includes a 5 x 10 problem (LA02), two 10 x 10 
(FTIO and LA19), three 15 x 10 problems (LA21, LA24, LA25), two 20 x 10 
(LA27, LA29), and five instances 15 x 15 (LA36 - LA40). In the comparisons 
we consider the best Shifting Bottleneck procedure (SB’88) proposed in Adams, 
Balas & Zawack [1], and the best Modified Shifting Bottleneck (SB ’95) procedure 
proposed in Balas et al. [4]. Moreover we consider the Beam Search algorithm 
(BS) of Sabuncuoglu and Bayiz [24] that makes use of PDR rules and the INSA 
insertion heuristic of Nowicki and Smutnicki [18]. The proposed algorithm is 
able to obtain comparable results in terms of solution quality with the SB’88. It 
clearly outperforms the BS and the INSA heuristic, whereas the SB ’95 algorithm 
dominates it. In any case it has to be pointed out that the SB’95 algorithm 



234 



Marco Pranzo et al. 



Table 5. Comparisons on hard instances 



Instance 


Optimum 


AMCC(0.80, 0 . 10)20 


SB’88 


SB’95 


BS 


INSA 


FTIO 


930 


969 


952 


940 


1016 


994 


LA02 


655 


656 


684 


667 


704 


722 


LA19 


842 


846 


863 


878 


882 


971 


LA21 


1046 


1132 


1128 


1071 


1154 


1179 


LA24 


935 


1014 


1015 


976 


992 


1021 


LA25 


977 


1039 


1061 


1012 


1073 


1147 


LA27 


1235 


1332 


1353 


1272 


1361 


1466 


LA29 


1152 


1315 


1233 


1227 


1252 


1385 


LA36 


1268 


1329 


1326 


1319 


1401 


1445 


LA37 


1397 


1462 


1471 


1425 


1503 


1726 


LA38 


1196 


1324 


1307 


1294 


1297 


1307 


LA39 


1233 


1366 


1301 


1278 


1369 


1393 


LA40 


1222 


1297 


1347 


1262 


1347 


1387 


Average error 




7.04% 


6.76% 


3.78% 


8.96% 


14.58% 



cannot be considered as simple greedy heuristics. Since our algorithm is much 
more general, thus resulting in a code which is less efficient than the code of 
specialized algorithms, we decide not to compare the computational times even 
if the time needed by our algorithm is acceptable. 

6 Conclusions and Future Research 

In this paper we have introduced a new class of randomized greedy heuristics for 
a general formulation of scheduling problems. Extensive computational results 
are presented and a comparison with other algorithms for the classical job shop 
scheduling is carried on. In particular, it turns out that our approach outper- 
forms other simple greedy schemes but some specialized algorithms dominate it. 
Nevertheless the computational times required by our algorithm are acceptable. 
In particular in the case of ideal job shop the algorithms are always able to find 
a feasible solution, whereas in the other constrained cases (blocking with swap, 
blocking no swap and no- wait) the proposed algorithms are not often able to 
find a feasible solution in a single run. The problem of finding a feasible solution 
and improving it has to be tackled by a metaheuristic scheme. Future research 
deals with developing metaheuristic algorithms based on the proposed heuristics 
for solving more constrained scheduling problems. 
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Abstract. Energy dissipation is a critical concern for battery-powered 
embedded systems. Memory energy contributes significantly to overall 
energy in data intensive applications. Low power memory systems are 
being designed that support multiple power states of memory banks. In 
low power states, energy dissipation is reduced but time to access mem- 
ory is increased. We abstract an energy model for the memory system and 
exploit it to develop algorithmic techniques for memory energy reduc- 
tion. This is achieved by exploring the structure and data access pattern 
of a given algorithm to devise memory power management schedules. We 
illustrate our approach through two well-known embedded benchmarks - 
Matrix Multiplication and Fast Fourier Transform. The optimality of our 
schemes is discussed using information theoretic lower bounds on mem- 
ory energy. Simulations demonstrate that signihcant energy reduction 
can be achieved by using our approach over state-of-the-art implemen- 
tations. 



1 Introduction 

Due to the explosive growth of portable, wireless devices and battery-operated 
embedded systems, energy efficiency has become a critical concern for design- 
ers today. Design technologies at all levels of abstraction are evolving with the 
common goal of energy reduction. The significance of high level analysis in the 
design cycle cannot be underestimated. It is rapid, fairly accurate, and plat- 
form independent. Moreover, decisions made at higher levels are likely to have a 
larger impact on energy reduction than those at the lower levels of abstraction. 
We have thus been motivated to explore energy efficient design and analysis 
methodologies at the algorithmic level. 

Until recently, majority of the research focus has been on optimizing proces- 
sor energy by exploiting techniques such as dynamic voltage scaling [26] [17], 
precision management, and IPC management [14]. Advancements in processor 
technology have led to development of low power processors such as the XScale 
PXA250 [15] that dissipate with less than InJ per instruction. The next chal- 
lenge lies in reduction of memory energy which accounts for as much as 90% 
of the overall system energy for CPU systems with peripherals [7]. For several 
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wireless applications implemented on the pico radio [25] , more than 50% of the 
overall energy is dissipated in the memory [27] . 

Advanced memory technologies such as the Mobile SDRAM [21] support 
several low power features such as multiple power states, bank/row specific ac- 
tivation, and partial array refresh (PASR). Our goal is to abstract the advanced 
features of the state-of-the-art memory systems and exploit them to design al- 
gorithms that reduce memory energy dissipation. This is achieved by designing 
algorithms optimized for reduced number of memory accesses, and implement- 
ing energy optimal memory power management schedules. We analyzed several 
benchmark kernels from the EDN Embedded Microprocessor Benchmark Con- 
sortium (EEMBC) [20] and the freely available Mibench Benchmarks [19], which 
are discussed in [34]. In this paper we present our results for Matrix Multiplica- 
tion and Fast Fourier Transform(FFT). We discuss information theoretic lower 
bounds for energy dissipation in the memory and use them to guide our al- 
gorithm design. Simulation results (see Section 5) demonstrate that using our 
techniques significant energy reduction can be achieved. 

Rest of this paper is organized as follows. In Section 2 we discuss related re- 
search. A high-level energy analysis is presented in Section 3 and lower bounds on 
memory energy dissipation are discussed. An energy efficient memory architec- 
tural design is proposed in Section 3.3. Optimal memory activation schedules for 
some well-known kernels are discussed in Section 4. Our simulation framework 
and results are presented in Section 5. Finally, we conclude in Section 6. 



2 Related Work 

Memory technologies such as the SDRAM reduce energy dissipation by switch- 
ing to lower power states based on the time of inactivity in the current state. 
We call this implicit power management as the memory power state switching 
is defined only by the duration of inactivity (wait time) of the memory bank. 
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Several researchers have exploited implicit power management for memory en- 
ergy reduction. Energy efficient page allocation techniques for general-purpose 
processors have been proposed in [16]. In [3], the authors investigate array al- 
location schemes to minimize memory energy dissipation. A large number of 
benchmark applications have been analyzed in [9] to find the optimal wait time 
for memory power state switching. The conclusion drawn suggests that memory 
should immediately transition to a lower power state when inactive. However, 
our simulations show that this is not the optimal policy always. Fig. 1 illustrates 
normalized values (w.r.t. case with no power management) for memory energy 
dissipation for EFT with implicit power management for various wait times. For 
smaller size FFT (n = 2®) all the policies result in increased energy dissipation. 
Energy reduction is observed for larger size problems with optimal wait-time as 
one cycle. Thus, we propose algorithm specific power management of the mem- 
ory. In this paper, we focus on design of optimal power management schemes for 
the memory based upon the structure and access pattern of the algorithms. 

The AMRM project [1] focuses on adaptation of the memory hierarchy to 
reduce latency and improve power efficiency. Techniques such as off-chip memory 
assignment, set associativity and tiling to improve cache performance and energy 
efficiency have been investigated in [29] . The memory segmentation problem has 
been shown to be NP complete in [10]. Several researchers [24] [18] [22] [6] [4] 
have explored memory organization and optimization for embedded systems. 
Their approach has been summarized below. 



Memory Architecture Customization. Memory allocation and memory 
bank customization problems deal with selection of memory parameters such 
as type, size, and number of ports. Memory building blocks and organization for 
application customized memory architectures are design synthesis problems and 
are not in the scope of this paper. We focus on algorithmic optimizations rather 
than hardware customizations. 



Application Specific Code and Data Layout Optimizations. Application 
specific platform-independent code (loop) and data flow transformations have 
been proposed to optimize the algorithm’s storage (memory, cache) and transfer 
requirements. This is followed by hardware customization (such as selection of 
a smaller cache) to reduce energy dissipation. Our approach is reverse as we 
optimize algorithms for a fixed architecture. 



Scratch Pad Memory. : On-chip memory is partitioned into data cache and 
scratch pad. Split spatial-temporal caches have been proposed to improve spatial 
and temporal data reuse in the cache. In our analysis, we propose use of a small 
(of the order of cache line size) memory buffer to aid in implementation of power 
management schemes. A partial bank of a Mobile SDRAM that can be power 
controlled independently can be used as buffer. Scratch pad memory can act as 
a buffer but at the expense of smaller cache size. 
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Our approach aims at optimizing memory energy dissipation by improving 
the memory access pattern of the algorithms. Higher data reuse reduces the 
number of memory accesses. Algorithm directed power management schemes are 
described within the algorithm to dynamically alter the memory power states. 
Buffering, prefetching and blocking strategies are used to reduce energy and 
latency overheads for memory power management. Memory energy is analyzed 
using a simple, high-level, memory energy model that considers memory to be 
organized as multiple banks that can be power controlled independently. Since 
we assume, the architecture to be fixed, memory parameters (such as no of ports, 
banks, interconnect bandwidth) are considered constant. 

3 Memory Energy 

We define a high level model for memory energy analysis of algorithms and 
discuss lower bounds on memory energy dissipation of algorithms. 

3.1 Our Energy Model 

We consider our system to comprise of a computational unit with an internal 
memory (e.g. cache) connected to a memory unit over an interconnect (Fig. 2(a)). 
We abstract the low power features of the memory systems as discussed below. 

— Memory has multiple power modes (states). Fig. 2(b) illustrates the the 
current drawn in various power modes of the SDRAM [21] memory system. 




(a) System Model 
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Active: Memory can be read or written to only in the active mode. Power 
dissipation is the highest in this mode. The energy dissipation is higher when 
the memory is being accessed and lower when it is in standby active mode. 
We do not consider the burst access mode in our current analysis. 

Refresh (idle/inactive): Power dissipation in this mode is low (zero for 
analysis). Data is preserved in this mode. Memory access in this mode results 
in higher latency as the memory must transit to the Active mode. 

Power Down: The power dissipation in this mode is the least but data is 
not preserved. For our analysis, we do not consider transition to this mode 
to ensure that the data is not lost. 

— Memory is organized as banks. Each bank can be placed in any power mode 
independent of the other banks. For example, Mobile SDRAM memory sup- 
ports bank and row specific activation and deactivation (precharge) and 
partial-array refresh (PASR) . 

— Transition of memory from one power mode to another incurs energy and 
time overheads. 

— Memory power management can be controlled through software. 

The total memory energy E{N) for problem size N is defined as the sum 
of the memory access energy Ea{N), the data storage energy Es{N), and state 
transition overheads Ep{N). The memory access energy is proportional to the 
memory data traffic. It also depends on parameters such as the load capacitance, 
frequency and voltage of the memory I/O, but we assume these to be fixed. The 
data storage energy is a function of the memory size and the time for which it is 
active. Energy dissipation in the idle state is considered to be negligible. Let Ka 
denote the memory access energy cost per unit of data. Kg be the storage energy 
cost per unit of data per unit time, and Kp represent the energy overheads for 
each power state transition. The memory energy is defined as follows. 

E{N) = Ea{N) + Eg{N) =KaX C{N) + Kg x S{N) x A{N) + Kp x P{N) 

Here, C{N) represents the total number of memory accesses and S{N) is 
the space complexity of the algorithm. We define memory activation complex- 
ity A{N), as the time for which memory is in the active mode. For an algo- 
rithm of time complexity T{N), if memory is active for time fraction a then 
A{N) = aT{N). For conventional systems that do not support memory power 
management A{N) = T{N). P{N) denotes the number of power mode transi- 
tions of the memory banks. 

3.2 Lower Bounds on Memory Energy 

Algorithms have been extensively analyzed and optimized in the past using the 
I/O complexity model [12] . This model measures performance as a function of the 
number of I/O operations, and abstracts systems where the latencies involved in 
accessing external memory are much larger as compared to internal processing. 
Prior results on I/O complexity are of significant interest as they can be used to 
determine lower bounds for memory energy dissipation. 
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Every computational unit in an embedded system has an internal memory of 
fixed size as illustrated in Fig. 2(a). The interaction between the internal memory 
and the (external to computational unit) on-chip memory, can be captured by 
using the I/O model for analysis. We defined C{N) as the number of memory 
accesses which is the asymptotically same as the I/O complexity Ti/o{N) [28]. 

Theorem 1. A lower bound on memory energy dissipation E(N) for an algo- 
rithm with problem size N and I/O complexity Tj/q(N) is given by I2 {Tj/q{N)). 

Proof: We know E{N) = Ka x C'(fV) -|- Ks x S{N) x A{N) Kp x P{N) (see 
Section 3.1). The memory must remain active for at least the time it is accessed. 
Thus if the access latency is I cycles, A{N) = S1{1 x C{N)). S{N) represents the 
space complexity of the algorithm or the size of memory required to store data. 
S{N) = k X M, where M is the smallest segment (bank incase of multi-banked 
architecture) that can be power controlled independently. Each memory access 
requires at least one memory bank to be active. In the best case scenario memory 
power management overheads P{N) are negligible. Thus, 

E{N) = KaX C{N) +KsX S{N) x A{N) 

= S7{Ka X C{N) + Ks X M X C{N)) = n{C{N)) = f2{Ti/o(N)) □ 

Note that the above lower bound on memory energy dissipation is independent 
of how the computation is performed (on RISC, DSP, FPGA). For a given kernel 
mapped on a single computation unit, it depends only on the size of the input 
data and the size of the internal memory of the computational unit. 

3.3 Memory Architecture Design 

Memory energy can be optimized by introduction of a small memory buffer of 
size B between the computation unit and the memory unit. Data is transferred 
from memory to buffer before it is accessed by the processor. The buffer is 
active all the time, but it permits the much larger sized memory module to 
remain idle for a longer time. The buffer is placed near the memory modules 
with a simple data transfer policy to/from memory. Thus, the latency for data 
transfer between the memory and the buffer is much lower than a memory access 
from a computational unit. The latter (for example a PCI) involves complex 
scheduling and bandwidth allocation. We consider memory to buffer latency to 
be unity while the buffer to computational unit latency is 1. 

A simple power management scheme is described as follows. A memory bank 
is activated only when there is a data transfer required from/to the buffer. 
The memory access energy and activation overheads are 0{Tj/q{N)), which 
is optimal. Next, consider the memory storage energy Es{N). In absence of 
the buffer, the entire memory must remain active while it is accessed. Hence 
by definition Es{N) = Kg x S{N) x A{N), where A(N) = f2{l.C{N)) and 
A{N) = 0{T{N)). The power management scheme described above reduces 
memory activation time to A{N) = 1 x C{N) = 0(Tj^q{N)). Thus, Es{N) = 
Ks x{Mx Tj/q{N)-\-BxT{N)). Any scheduled data transfer from buffer to com- 
putational unit need not be of size larger than the line size of the internal memory 
of the computational unit. Hence, it is sufficient to have B = 0{L), where L is 
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the cache size. Cache size is typically in Kilobytes whereas memory size ranges 
from Megabytes implying M ^ B. Therefore, energy dissipation in the buffer 
can be ignored as compared to the memory, and Es{N) = Kg x M x Tj/q{N) = 
Q{Tj,o{N)). Thus, E{N) = 0{Ti,o{N)). □ 

Remark 1. Since the memory access time is reduced from / to 1, memory storage 
energy is reduced by a factor of I by using the buffer even in the best case scenario 
when A{N) = 0{l.C{N)) in absence of buffer. 

Remark 2. The memory buffer increases memory to processor data access latency 
from I to 1+1 as data is fetched to buffer before it is transferred to (from) memory. 
However, data prefetch can be utilized to schedule transfer of data into the buffer 
before it is accessed by the computational unit. 

4 Memory Energy Optimization 

Memory energy optimization involves designing algorithms that reduce memory- 
processor data transfers and implement energy optimal memory activation sched- 
ules. Our approach can be summarized as follows: 

— Understanding the memory access behavior of the kernel algorithm. 

— Minimizing the number of memory accesses required by optimizing the cache 
complexity of the algorithm. For example, data layout can be altered. 

— Designing power management schedule based on the memory access pattern. 

— Reducing the power management overheads. 

It is important to note that our analysis holds for all computational units with 
an internal memory. For our simulations, we consider a RISC computational 
unit. The data cache (internal memory for RISC) is of size 0{L). 

4.1 Matrix Multiplication 

Matrix Multiplication is an embedded automotive/industrial benchmark [20]. 



Baseline (MMS). Consider multiplication of two N x N matrices A and B to 
produce matrix D. Computation complexity of this algorithm is 0{N^). Compu- 
tation of each element of D requires the corresponding row from A and column 
from B to be fetched into the cache. There is no data reuse. The number of 
memory to cache transfers C{N) is at least 2N x = O(N^). elements 
need to be stored in memory. The memory is active all the time. The storage 
energy is 0{A{N) x S{N)), which is 0{N^) x O(N^) = O(N^). Thus, memory 
energy is given by 0{C{N)) + 0{A{N) x S{N)) = 0{N^) + o[n^) = 0{N^). 
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Fig. 3. Memory Access Schedule 



Blocked (MMB). We investigate an alternative implementation using blocked 
(tiled) data layout with block size b. The block should be able to fit into the 
cache and thus, = 0{L). The arithmetic complexity for this implementation 
remains 0{N^). Data is fetched and operated upon as blocks, resulting in higher 
data reuse in the cache. To compute b^ elements of D, we require 27V x b transfers. 
C{N) for this algorithm is reduced to 0{2N^ /b), which is an improvement by 
a factor of b. Since the memory modules remain active for the entire duration, 
the memory energy is given by 0{C{N)) + 0{A{N) x S{N)) = 0{N^/b) + 
0{N^ X 3iV^) = 0{N^). The memory access energy is decreased with reduced 
data traffic, but the storage energy remains same. 

Memory Power Management (MMBPM). As the next level of optimiza- 
tion, we reduce the activation time for the memory. This is achieved by explicitly 
scheduling the memory power state transitions (see Fig. 5) based on the memory 
access pattern of the algorithm. The data access timing diagram is illustrated in 
Fig. 3. After each successive fetch of 0{b^) from matrix A and B computation 
of 0{b^) takes place. Energy can be saved by deactivating the memory modules 
for this duration. Thus, memory activation time A{N) is reduced to 0{N^/b). 
The storage energy is reduced to 0{N^/b) x 0{N'^) = 0{N^/b). 

Memory Buffer (MMBBUF). Using a memory buffer and blocking, the 
energy of the system can be decreased to 0{N^/b) as described in Section 3.3. 
Since we can predict which block will be required for computation next, data 
prefetch [2] can be used to hide buffer to memory data transfer latencies. 

The I/O complexity as analyzed by Hong and Kung [12] is given by : 

Lemma 1. For matrix multiplication of two N x N matrices on a processor 
with L words of memory, the I/O complexity bound is given by Q{N^ /'/L). 

Using Theorem 1 and Lemma 1 it follows. 

Theorem 2. Memory energy dissipation for Matrix Multiplication of two N x N 
matrices is 0{N^/'/L), where L is the cache size. 
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Remark 3. We discussed an energy optimal implementation of Matrix multipli- 
cation (MMBBUF). For h = '/L, where L represents the cache size, the memory 
energy are reduced to n{N^ /'/L). This is the optimal as shown by Theorem 2. 

4.2 Fast Fourier Transform 

Fourier Transforms are used in digital signal processing in several embedded 
applications such as the Asynchronous Digital Subscriber Line (ADSL) where 
data is converted from the time domain to the frequency domain to reduce error 
rate. The problem size in Mibench is 2^® and we use the same for analysis. We 
examine the Cooley- Tukey algorithm, which involves recursive decomposition of 
a larger FFT into smaller sub-problems that can be solved efficiently. For com- 
putation of an Wpoint FFT, the computation complexity of this algorithm is 
0(iV log 2 N). Consider computation of an iVl x A^2-point FFT. The data layout 
in the memory can be analyzed in terms of data stride, which is defined as the 
distance between data blocks that are accessed successively. The stride of data 
for the A^2-point computation is determined by the iVl-point FFT computa- 
tion. The cache performance is poor if the stride is too large. Dynamic Data 
Layouts [23] [31] can be used to improve the cache performance. These have 
been exploited to improve the time performance [23]. 



Baseline (FFT). As our baseline case, we chose an optimal implementation 
from the FFTW library and analyze its cache behavior as a function of the data 
stride. For small strides, there is a large spatial locality in the data. Therefore, 
there are only compulsory cache misses, and for an iV-point FFT, 0{N/b) misses 
are encountered. Here b is the cache block size. However, if the stride is very large, 
cache performance is very poor due to increased number of conflict misses. A 
cache miss could incur for every data access required for the FFT computation, 
which is 0{N log 2 N). 



Dynamic Data Layouts (FFTD). Data reorganization could be performed 
prior to every FFT computation to reduce the data stride. However, this itself 
dissipates a lot of energy as it could involve 0{N) memory accesses. Therefore, 
prior to each FFT computation, a tradeoff is performed between the data reor- 
ganization overheads involved and the acceptable data stride. The FFT decom- 
position strategy is selected taking this tradeoff into consideration. Significant 
energy reduction is achieved by improving the cache performance. Details of the 
algorithm are discussed in [34]. 



Memory Power Management (FFTDPM). Conventional computation of 
FFT follows the following sequence of operations. Two elements are fetched 
from memory, operated on and the result is placed back in the memory. This 
approach requires the memory to be active all the time. By increasing the inter- 
val between successive data fetches, we can exploit memory power management 
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for energy reduction. This is achieved by block computation. FFT is computed 
over blocks of size h = 0{L), where L is the cache size, h elements are fetched, 
FFT is computed over this block and the result is placed back. This involves 
only 0{N logf^ N) data fetches. Moreover, it permits us to schedule the memory 
power state switching. Each time h elements of data are fetched and computed 
upon for 0{h\ogh) time before the next data transfer is scheduled. Memory is 
activated and deactivated based on this schedule. The memory activation time 
is reduced from 0(7Vlog2 N) to 0(iV log 2 A^)/(log 2 h), which results in improve- 
ment of storage energy by 0 (log 2 h). 

Memory Buffer (FFTDBUF). Next, we utilize the memory buffer along with 
data prefetch. The storage energy in all the previously discussed implementa- 
tions is A{N) X S{N), which is 0{N^ logy^ N) . Using a memory buffer, we can 
reduce this to 0{N\ogj^N) (see Section 3.3). 

Hong and Kung [12] have shown the following result: 

Lemma 2. The I/O time for computing an N -point FFT on a processor with L 
words of memory is at least l7(iVlog2 A^/log 2 L). 

From Theorem 1 and Lemma 2, it follows. 

Theorem 3. For computing an N -point Fast Fourier Transform, the lower 
bound for memory energy is Q{N\og 2 A^/log 2 L), where L is the cache size. 



Remark 4- The energy reduction techniques discussed above achieve an energy 
optimal implementation of FFT. The optimal bound for memory energy for our 
implementation is Sl{Nlog 2 iV/log 2 L). 

5 Simulations 

Energy estimation for the algorithms described above is challenging as currently 
there is no hardware or middle- ware support, or simulators that support al- 
gorithm directed power management. Thus, we designed a high level energy 
estimation framework for fast and yet fairly accurate energy estimation of algo- 
rithms. 

5.1 Simulation Framework 

The simulator is based upon instruction level analysis, where cost of each in- 
struction depends on the power state of the system. The choice of this level of 
abstraction is based on several reasons. The foremost being the speed without 
much loss in accuracy. Moreover, it is a suitable abstraction from an algorithm 
designer’s perspective. An algorithm description augmented with the power man- 
agement schedule (see Fig. 5) is supplied as input to the simulator. The Sim- 
pleScalar Toolset [5] is configured for the chosen architecture and modified to 



Algorithmic Techniques for Memory Energy Reduction 247 



provide an instruction execution trace with an embedded power schedule. The 
trace consists of a limited set of instructions (7 currently). The classification is 
based on their energy costs. 

— State: The State instruction permits the designer to change the power state 
of the system by changing the power mode of any component. For example 
the memory can be placed in a low power inactive mode. 

— Compute: This represents all processor only instructions that do not require 
any cache or memory access. Measurements [30] have demonstrated that 
there is little power variation among these instructions. 

— ReadCache and WriteCache: Data is accessed from the cache. 

— ReadMem and WriteMem: A cache miss occurs resulting in a memory access. 

— Prefetch: Data is prefetched into the buffer. Energy costs are similar to 
ReadMem but there is no latency in fetching the data. 

Each instruction has an associated power and time cost depending on the 
power state of the system. The State instruction denotes change in the power 
state of the memory and accounts for the state transition overheads. A profile is 
maintained for power dissipation in each architecture component to identify hot 
spots. We have modeled the Itsy Pocket Computer for which detailed measure- 
ments are available [35]. Currently we model only power modes of the memory 
and the switching clock enabled/disable mode of the processor. The Itsy mea- 
surements do not profile state transitions. The LART [26] measurements are 
used to obtain overheads for changing the processor mode. The memory activa- 
tion/deactivation latency is considered to be one cycle. 




Fig. 4. Simulation framework 
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TransformNtoB(A, Ab, k, m, kbik, mbik) 
Tr_NtoB{B, Bb, n, k, nbik, kbIk); 
TransformNtoB(C, Cb, n, m, nbIk, mbIk); 
for(jb=jj=0;jb<nblk;jb++){ 

jbs = ((jj+=BS) < n) ? BS: n-jj+BS; 
for(lb=ll=0;lb<kblk;lb++){ 
lbs = {(ll+=BS) < k) ? BS: k-ll+BS; 
bpt = &(Bb{jb,lb)); 
for(ib=ii=0;ib<mblk;ib++){ 
ibs = {(ii+=BS) < m) ? BS: m-ii+BS; 
a = &(Ab(ib,lb)); t = &{Cb(ib,jb)); 
POWERMG(STATE MEMORY MODE 0) 
for{i=0;i<ibs;i++){ b = bpt; 
for(j=0;j<jbs;j++){ temp_c=0; 
for(l=0;klbs;l++){ 
temp_c += a[l] * b[l];} 
t[j] += temp_c; b += BS;} 
a+=BS; t+=BS;} 

POWERMG(STATE MEMORY MODE 1)}}} 

TransformBtoN(Cb, C, n, m, nbik, mbik); 



Fig. 5. Sample Code 



The goal of this simulator is to guide algorithm design by identifying the 
trend. It cannot guarantee high accuracy due to several reasons. It does not 
simulate the dynamic effects of changing the power state. For example, consider 
a scenario where memory is inactive. The trace shows a cache miss. There is 
a delay associated with memory activation. This could have resulted in more 
cache misses in a running system. Since our trace is pre-computed, such effects 
are not accounted. 

The accuracy of the simulator depends on the power models incorporated. 
Some of the values have been approximated when measurements are not avail- 
able. For example, the power during a state transition is approximated as the 
average power between previous and next state. Variation in input data may 
change the switching activity. We assume an average power cost for each in- 
struction. We have used benchmark data when available, and randomly gener- 
ated data otherwise. The framework is fast, modular and sufficiently accurate 
for algorithmic analysis. Once some algorithm designs are identified to be energy 
efficient they can be tested on lower level simulators or implementation boards 
for higher accuracy. 

5.2 Simulation Results 

Matrix Multiplication. We simulated matrix multiplication for n = 32 and 
n = 64 and the results are illustrated in Fig. 6. Note that to improve the clarity 
of the figure we have scaled down the results for n = 64 by a factor of 8. The 
energy reduction is incremental with problem size. We consider n = 64. Our al- 
gorithm using blocking (MMB) with block size 16 reduced energy dissipation by 
93.6% keeping memory active all the time. This reduction was achieved due to 
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improved cache behavior and thus the execution time also decreased. We exam- 
ined the effect of implicit power management (MMSP) on the two algorithms. 
We assumed memory becomes idle if not accessed for 1 cycle. The energy of 
the conventional algorithm was reduced by 57.4%, but the execution time was 
increased due to memory activation overheads. The same policy applied to our 
blocked algorithm (MMBP) reduced energy by 96.4% with no increase in time. 
Explicit power management (see Fig. 5)) (MMBPM) reduced the number of 
accesses to the memory when it was inactive. Memory energy was reduced by 
96.43%. Memory power management in presence of a memory buffer (MMB- 
BUF) reduced memory energy by 37% over MMBPM. Thus, an overall memory 
energy reduction by 97.7% was achieved with no increase in execution time. 



Fast Fourier Transform. Simulation results for FFT are illustrated in Fig. 6. 
We have scaled down the results for large n to improve clarity. Data reorgani- 
zation (FFTD) and implicit power management (FFTP) are only beneficial for 
large size problems. We observe an increase in energy for both these techniques 
for n = 2® due to high overheads. For larger problems, energy is reduced and 
the improvement is proportional to problem size. Implicit power management 
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(FFTP) reduces energy by 57%. For n = 2^^, we observe energy reduction 5% by 
using our algorithm (FFTD) without power management, 59.7% using implicit 
power management (FFTDP), 60% using memory power management (FFT- 
DPM) and 70% using memory buffer and power management (FFTDBUF). 

6 Conclusion 

In this paper, we presented algorithmic techniques for memory energy reduction 
by reducing data traffic and design of efficient power management schedules. 
Note that reduction in data traffic also decreases energy dissipation over the 
(high capacitance) interconnect. Currently, we do not exploit all the features of 
the memory such as lower energy dissipation by accessing data in a burst (see 
Fig. 2(b)), which will be investigated in our future work. We will also integrate 
other power management schemes in our analysis such as the DVS for the pro- 
cessor to understand the interactions between the processor and the memory. 
For example, slowing the processor may reduce processor energy but increase 
memory latency and energy. 
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Abstract. Scheduling problems have attracted the attention of the al- 
gorithms community for several decades. A large number of scheduling 
problems have been proposed and studied, and many different techniques 
have been devised for solving them. Among the reasons why scheduling 
problems are so fascinating are their rich variety, both in form and in 
complexity. Furthermore, there is a sea of applications for scheduling 
problems, which arise from an equally varied number of areas. 

This note briefly surveys a technique that has been successfully used to 
solve (or approximately solve) a large number of scheduling problems 
with minimax objective function. 



1 Introduction 

Scheduling problems have attracted the attention of the algorithms community 
for several decades. The importance of scheduling problems stems, partly, from 
their large and varied set of applications, including manufacturing, transporta- 
tion and planning, operating systems, and communication networks, among oth- 
ers. 

Many artificial and real-life scheduling problems have been studied, yielding 
as a result a rich and powerful scheduling theory. A variety of techniques for 
solving these problems have been devised, along with methods for determining 
their complexity. Many of these results have strongly influenced research in other 
areas. 

Most scheduling problems can be described very concisely, and all of them 
involve a a set J of jobs that needs to be processed by a group M of machines. 
The processing of every job ji requires a certain amount of time, which might 
or not depend on the machine (s) selected to process it. Additionally, jobs might 
have some properties that need to be considered when producing a schedule 
for them: there might be some constraints in the order in which jobs might be 
processed; a job might require more than one machine for its processing; a job 
might be suspended and resumed at a later time; a job might have a due date 
before which it must be completed; and so on. 
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Machines might be identical, or they might have different speeds. Machines 
might be available only during some times, or a machine might need to collab- 
orate with other machines to process a job. 

When scheduling a set of jobs, it is normally desired to minimize or maximize 
some objective function. Typical objective functions are minimize the maximum 
completion time of the jobs, minimize their average completion time, and min- 
imize the maximum tardiness (completion time beyond the due date) of the 
jobs. 

Equally varied is the set of techniques and methods that have been proposed 
to deal with scheduling problems. For an overview of these techniques the reader 
is referred to [5, 12, 13]. In this paper we are interested in studying polynomial 
time algorithms that provide exact or approximate solutions for scheduling prob- 
lems. Furthermore, we are mainly interested in scheduling problems that have a 
min-max objective function: minimize the maximum completion time, lateness, 
or tardiness of the jobs. The completion time of a job is the time when the job 
completes its processing. If jobs have due dates, then the lateness of a job is 
defined as the difference between its completion time and its due date (note that 
the lateness of a job is negative if the job completes before its due date). The 
tardiness of a job is the maximum between zero and its lateness. 

We describe a framework for designing polynomial time approximation 
schemes for a large number of scheduling problems with min-max objective 
function. This framework has been successfully used on a very large number 
of problems [1, 2, 4, 6, 7, 8, 9, 11, 10, 14]. A polynomial time approximation 
scheme (PTAS) is an algorithm that produces a solution of value at most a fac- 
tor 1 -|- e times larger than the value of an optimum solution for any precision 
value e > 0. The running time of a PTAS is polynomial in the size of the input, 
but it might be super-polynomial in the inverse 1/e of the precision. 



2 The Framework 

We assume that we know lower LB and upper U B bounds for the value OPT of 
the optimum solution and, furthermore, we assume that the bounds are “tight” 
in the sense that UB/LB < a, for some constant value a. This framework 
might be used on problems for which there is an algorithm A that computes 
solutions of value LB + O(Pmax), where Pmax is the maximum contribution of 
a job to the value of the objective function. We will specify in the next section the 
meaning of the “contribution of a job to the value of the objective function” . For 
example, for the problem of scheduling jobs on identical machines to minimize 
the maximum completion time, Pmax is the maximum job processing time. Note 
that algorithm A produces a near-optimal solution if all jobs are “small” in the 
sense of the magnitude of their individual contributions to the total value of the 
objective function. Many scheduling problems have the property that they can 
be solved almost optimally when all jobs are “small” . 

The set of jobs is split into 3 groups, commonly called the large, medium, 
and small jobs. Let Pi be the contribution of job ji to the value of the solution. 
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Medium jobs ji are chosen so that each one of them has small Pi value and the 
sum of Pi values of all the medium jobs is also small. More specifically, for some 
small constant value e: > 0, and constant positive integer value /t, medium job ji 
has value e^^LB < Pi < and the sum of Pi values of the medium jobs 

is at most em{LB). Note that k < [1/e]. The set of medium jobs constitutes 
a smaller instance of the original problem formed by small jobs only. If a lower 
bound LBm for this instance is small compared to LB, then algorithm A above 
can be used to find a small value solution for it. If it is possible to combine 
this solution with a near-optimal schedule for the large and small jobs, then we 
obtain a near-optimal solution for the original problem. 

The interesting property about this partitioning is that the instance com- 
posed of large and small jobs consists only of jobs (large) with value Pi > 
^2 {k-i)lb and jobs (small) with value Pi < e^'^LB. This gap between the val- 
ues of small and large jobs makes it possible to deal with large and small jobs 
“almost” independently. We deal with the large jobs by first rounding their val- 
ues up so they are multiples of e^'^~^LB. This rounding increases the value of 
a large job by at most a factor 1 -|- e and, thus, it increases the value of an 
optimum schedule for the large and small jobs by at most the same factor. 

Since each large job has a value that is only a constant factor smaller than 
the value of an optimum solution, then only a constant number of them can be 
processed by a particular machine. Hence, we can use dynamic programming (or 
enumeration if there is a constant number of large jobs) to find a short feasible 
schedule for them. Divide the time interval [0, U B] into sub-intervals of size 
^2k-i lb. Consider a machine and a feasible schedule for the large jobs on that 
machine. This schedule can be described by a dimensional binary vector, 

called a configuration. The number of different configurations is p = 

Note that p is a huge, but constant number. The schedule for the entire set of 
machines is described by a p-dimensional vector, whose t-th component indicates 
the number of machines with the i-th configuration. We can find feasible short 
schedules for the large jobs by taking every large job and trying to place it in each 
one of the available configurations. The running time of this dynamic program 
is 0{mP). 

For each feasible schedule for the large jobs, the small jobs are assigned to 
the empty gaps left by them. This is usually done by using a linear program 
or a greedy algorithm. When assigning small jobs to gaps we must ensure that 
the instance defined by the small jobs on a gap of size d has optimum value at 
most d+ 0 (-f]/.mmo 2 ,), where P/, is the value of the largest small job in the gap. 
Finally, algorithm A is used on each one of the gaps containing small jobs to 
find a feasible schedule for them. The size of each gap is increased by O(Pmax) 
to accommodate the schedule produced by A. Since A schedules only small jobs, 
this increase is of value 0{e^'^)LB. As there are at most gaps, the overall 

increase caused by A is 0{e)LB. 

Among all the solutions computed by the above algorithm, the one with 
smallest value is finally selected. 
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3 Some Problems 

Let us apply the framework on a few scheduling problems in which the objec- 
tive is to minimize the maximum completion time or makespan. The first four 
problems deal with identical machines. 



Identical Machines. For each job Pi is its processing time. It is well known 
that valid bounds for the optimum solution are LB = max{^ Pmaxlj 

and UB = ^ + -fmax; UB/LB < 2. Moreover, the list scheduling algo- 

rithm behaves exactly as described for algorithm A. For the instance of the 
problem formed by the medium jobs, A finds a schedule of length at most 
eLB + < 2eOPT. The schedule for the medium jobs is appended 

to the schedule for the large and small jobs. A greedy algorithm is used to assign 
small jobs to empty gaps so that a gap of size d with m! idle machines gets small 
jobs of total size at most m! d. 



Release Dates. For each job A we choose Pi = Pi + Vi, where pi is its processing 
time and is its release date. Tight bounds for the optimum solution are LB = 
A ^ . Pi^p^^^^ Cmax} and UB = ^Y.iPi+ Pmax + T’max < 3PP, where Pmax 
and Cmax are the largest processing time and release date, respectively. Note 
that if a feasible schedule for J is shifted by e^'^~^LB units of time, then we can 
disregard release dates for small jobs (recall that small jobs have small Pi value 
and not just small processing times). Also, if medium jobs are scheduled after 
time Cmax, we can disregard their release dates and, thus, the list scheduling 
algorithm can be used as algorithm A. Observe also that after rounding, we 
might assume that processing times and release dates of large jobs are integer 
multiples of er^'‘“^PP. 



Delivery Times. Assume that besides its processing time, every job ji also 
has a delivery time Qi. Given a feasible schedule, the delivery completion time 
of a job ji is defined as Ci + qi, where Ci is the time when job ji finishes its 
processing. The problem is to minimize the maximum delivery completion time. 
This problem is equivalent to that of minimizing the maximum lateness of the 
jobs. 

For every job we choose Pi = Pi- If (/max is the maximum delivery time, then 
we can choose LB = max{ A Y,iPi,Pma.^, 9max} and UB = A ^.p^-l-pmax+gmax- 
To account for the delivery times we introduce a non-bottleneck machine Mq 
which can “process” the delivery times. Each configuration is augmented to 
accommodate Mq. In each augmented configuration we only need to keep track 
of the largest delivery completion time in Mq. If two augmented configurations 
are identical except for the delivery completion time on Mq, we only keep the 
smaller one. 

Once more, we can use the list scheduling algorithm as algorithm A. However, 
when we assign small jobs to gaps, we consider the small jobs in non-increasing 
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order of delivery time. Thus, small jobs with large delivery times are placed early 
in the schedule, and small jobs with small delivery times are placed near the end 
of the schedule. 

Medium jobs are scheduled with the list scheduling algorithm, but this time 
the schedule for the small and large jobs is appended to the schedule for the 
medium jobs. 

Chain Precedence Constraints. Consider that there are chain precedence 
constraints restricting the processing order of the jobs. Assume that each prece- 
dence chain has fixed size fi. 

We consider every chain Si as a single job and define Pi = Pi- Let ^max 

be the total processing time of the largest chain. Then, we can choose LB = 
J2iPi: 5'inax} and UB = ^J2iPi + >5'max- The upper bound is achieved 
by the list scheduling algorithm. 

In [9] it is shown that the value e can be chosen so that all large chains Si 
consist of only jobs with length at least (smaller jobs in long chains 

can be disregarded by increasing the length of the schedule by 0{e)LB). By 
considering each chain as a single job, the list scheduling algorithm can be used 
as algorithm A. 

Medium and small chains are scheduled as in the case without precedence 
constraints. 

Uniform Machines. Assume that each machine j has a speed Sj, so the time 
needed to process job ji on machine j is Pijsj. Assume that the minimum and 
maximum machine speeds are 1 and Smax, respectively. Furthermore, consider 
that Smax is constant. 

For each job ji we choose Pi = Pi- Tight bounds for the value of the optimum 
solution are LB = maxjX^iPi/ J2i Si^Pmax/s} and UB = Y^iPi/ Ei +Pmax < 
2LB. The upper bound is achieved by an algorithm that considers the jobs in 
non-increasing order of processing time and schedules a job on the machine that 
minimizes its completion time [3]. This is algorithm A. 

The small jobs are greedily assigned to empty gaps, so that a gap of length d 
in which the subset S of machines is idle gets small jobs of total processing time 
^ 0 - 

3.1 Fixed Number of Machines 

If the number m of machines is fixed, then the number of large jobs is also fixed. 
Hence, we can use enumeration instead of dynamic programming to schedule the 
large jobs. The main advantage of using enumeration is that we can ensure that 
one of the schedules constructed for the large jobs is identical to the schedule for 
the large jobs in an optimum solution. This is a powerful observation, that allows 
us to use the framework to design approximation algorithms for very complex 
problems [1, 2, 4, 7, 8, 10, 11, 14]. When considering such complex problems it is 
usually not easy to assign the small jobs to the empty gaps left by the large jobs. 
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Thus, linear programming is used instead of a greedy algorithm. A disadvantage 
of using linear programming is that it might produce fractional assignments of 
small jobs to gaps. 

To illustrate the use of the framework when the number of machines is fixed, 
let us consider a few problems involving minimizing the makespan. 



Unrelated Machines. When machines are unrelated, the processing time of 
a job depends on the machine that processes it. The processing time of job ji on 
machine k is denoted as pik- Let di = min^ ptk- We choose LB = A and 
UB = — ’m{LB). The upper bound is achieved if the jobs are scheduled 

sequentially in their fastest machines. We choose Pi = di. 

The total processing time of the medium jobs is em{LB) = 0{e)LB. Hence, 
the medium jobs are scheduled sequentially in their fastest machines, and this 
schedule is appended to the schedule for the large and small jobs. 

Fix a feasible schedule for the large jobs. Let t be the maximum completion 
time of the largest jobs in this schedule. The time interval [0, t] is divided in sub- 
intervals of size e‘^'^~^LB as above, and we consider an additional sub-interval 
that starts at time t and ends at time T (T is a variable). For each job ji we 
introduce variables xf). to denote the fraction of job ji that is scheduled on 
machine k during sub-interval 1. For each sub-interval £, let Sa be the set of 
machines available for processing small jobs and let Li be the length of the sub- 
interval. Then, we assign small jobs to the empty gaps left by the large jobs by 
using the following linear program. 



Minimize T 

s-t- Hu xik = 1 
Hik4kPik <Li\St\ 



xfk = 0 

4k > 0 



T > 0 



for all small jobs ji 
for all subintervals 
for all k ^ Se, 
for all k £ Si 



This linear program has one constraint for each small job, plus a constant 
number, of additional constraints for the subintervals. Hence, a basic 

feasible solution of the linear program has only fractional values on the 

variables Small jobs with fractional assignments are simply moved to the 
end of the schedule, where they are placed sequentially in their fastest machines. 
This increases the length of the schedule by only me{LB) = 0{e)LB. 

Finally, the other small jobs are simply scheduled according to the solution 
of the linear program. 



Unrelated Machines with Precedence Constraints. We assume now that 
there are precedence constraints restricting the order in which jobs can be pro- 
cessed. The precedence constraints partition the set of jobs into dependent 
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groups Si, such that two jobs from the same group Si need to be processed 
in an order consistent with the precedence constraints, but jobs belonging to 
different groups are independent. Here we assume that each one of the groups 
has fixed size, and the largest group has fj. jobs. 

For a group Si let Di = re-define our set of jobs by considering 

each group as a single job, and choosing Pi = Di. The jobs in each large group 
are rounded so that each one of them has size at least Note that there 

is a constant number of different relative orders for the jobs belonging to the 
same group and, hence, there is a constant number of different schedules for the 
jobs belonging to large groups. 

Fix a schedule fro the jobs in the large groups. To allocate small jobs to empty 
gaps we use a slightly more complicated linear program than above. This time 
for each small group Si we define variables , where k and £ are /i-dimensional 
vectors indicating the machines and intervals where the jobs from Si are to be 
placed. 

After solving the linear program, the small and medium jobs are scheduled 
as above. 
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Abstract. Social network analysis is a subdiscipline of the social sci- 
ences using graph-theoretic concepts to understand and explain social 
structure. We describe the main issues in social network analysis. General 
principles are laid out for visualizing network data in a way that conveys 
structural information relevant to specific research questions. Based on 
these innovative graph drawing techniques integrating the analysis and 
visualization of social networks are introduced. 



1 Introduction 

Social Network Analysis is a subdiscipline of the social sciences using graph- 
theoretic concepts to describe, understand and explain, sometimes even predict 
or design, social structure. It is focused on uncovering the patterning of people’s 
interaction and based on the intuitive notion that these patterns are important 
features of the lives of the individuals who display them. Starting from social 
sciences the study of social networks became an interdisciplinary field. On one 
hand, it is guided by formal theory organized in mathematical terms, on the other 
hand grounded in the systematic analysis of empirical data. Network analysis 
has found important applications in organizational behavior, inter-organizational 
relations, the spread of contagious diseases, mental health, social support, the 
diffusion of information and animal social organization. 

Since the 1980s, a yearly international conference on social network analysis, 
called SUNBELT is organized by the International Network for Social Network 
Analysis, INSNA P|. A comprehensive, though non- visual, tool for social network 
analysis is UCINET |2|. For a comprehensive summary of social network analysis, 
its levels of analysis and its methodological tools see m- 

Also applications such as the analysis of Web graphs, bibliographic data, or 
financial data, often use similar or identical methods like in social network anal- 
ysis. Recently, there is growing interest to understand the structure, dynamics 
and evolution of the Internet and WWW, and this way social network analysis 
has been rediscovered in other fields. Especially physicists in the complex sys- 
tems community are interested in the statistical mechanics of complex networks. 
The very general questions in complex systems are how networks emerge, what 
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they look like, and how they evolve. This includes networks from such diverse 
areas as physics, biology, economics, ecology, and computer science. Modeling 
networks as dynamical systems, network morphogenesis and self-organization, 
as well as new graph theoretical aspects and network reconstruction from exper- 
imental data are considered, and 0. It seems that because of this new 

emerging interest in networks at all graph theory and graph algorithms attract 
increasing attention from other sciences CHI, Ca- 
in 1996, we began a cooperation with researchers from political science, aimed 
at providing the methodology of social network analysis with tailor-made means 
of automated visualization. Given the importance of visualizations for scientific 
development, it is astonishing how little attention the subject had received so far 
in the analysis of social networks. One of the rare exceptions is m Even though 
a fair amount of software has been available to facilitate graphical editing, and 
even automatic layout of networks, the State of the Art that time seemed too 
heuristic to be satisfactory for supporting network analysis. 

One of the first outcomes of our interdisciplinary cooperation was a survey of 
visualization methods in use at that time mi- In that paper, general principles 
are laid out for visualizing network data in a way that conveys structural infor- 
mation relevant to specific research questions. These general principles resulted 
in innovative uses of graph drawing methods for social network visualization, 
and prototypical implementations thereof. With the growing demand for access 
to these methods, we started implementing an integrated tool for public use, the 
tool visone HH). The main application area of '^isone is a methodological approach 
in the social sciences. Its usage is focused on graphs of small to medium size. As 
an alternative especially for large graphs, we recommend to try Pajek [H|. 

2 Social Networks 

A social network consists of entities such as persons, organizations, or things, 
that are linked by binary relations such as social relations, dependencies, or ex- 
change. These relations may be directed or undirected, weighted or unweighted, 
and weights, if present, may be interpreted as increasing or decreasing the tie be- 
tween the two entities. Since data is often gathered by means of questionnaires, 
not even the existence of an edge is a sure thing. The two respondents corre- 
sponding to the end-vertices of a potential edge may have different perceptions 
regarding the presence of a specific type of tie between them. It is a long-standing 
debate whether unconfirmed edges should be included in an analysis. Typically, 
researchers decide to either treat unconfirmed edges like confirmed edges, or to 
exclude them completely. A crucial feature in many studies is the interrelation 
between the structure of a social network and the attributes of its elements. 

We define a social network to be a labeled directed graph G = {V,E = 
EcUEjj', S, u>), where Ec and Ejj are disjoint sets of confirmed and unconfirmed 
edges, S : E ^ IR>o is a non-negative edge length, and w : A — > IR>o a non- 
negative edge strength. A vertex or edge attribute is a (partial) function assigning 
nominal or numerical values to vertices or edges. 
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Although we cannot put any restrictions on the class of graphs, typical ex- 
amples from social science projects are sparse, locally dense, and exhibit a small 
average distance between pairs of vertices. 

3 Analysis 

The purpose of social network analysis is to identify important vertices, crucial 
relationships, subgroups, roles, network characteristics, and so on, to answer 
substantive questions about structures. There are three main levels of interest: 
the element, group, and network level. On the element level, one is interested 
in properties (both absolute and relative) of single actors, links, or incidences. 
Examples for this type of analysis are bottleneck identification and structural 
ranking of network items. On the group level, one is interested in classifying 
the elements of a network and properties of subnetworks. Examples are actor 
equivalence classes and cluster identification. Finally, on the network level, one 
is interested in properties of the overall network such as connectivity or balance. 

While we have an intuitive understanding what makes a vertex important or 
central, there is no universally accepted definition of importance. Centrality of 
a vertex may for example be measured according to the degree of that vertex, 
its distance to all other vertices or the number of shortest paths between two 
other vertices that contain the vertex itself. Similarly, there are different notions 
of importance or status in a directed graph. We refer to m for an unification 
and overview of such indices. Similarly, mathematical terms that capture to 
what extend networks tend to build clusters, like the clustering coefficient, or 
how networks evolve, like the degree distribution, are of interest 0. Questions 
regarding the overall structure ask for example to what extend the network 
exhibits the small-world phenomenon m- 

Algorithmic aspects in network analysis concern the fast computation of such 
indices. Vertex indices are often easily computable in polynomial time. However, 
more efficient algorithms that are applicable also for large graphs as the fast 
algorithm for betweenness centrality presented in 0, are of increasing interest 
in this context. 

4 Visualization 

In graph drawing algorithms are designed that try to produce what is often 
termed an “aesthetic” visualization of a graph. In network analysis the demand 
that visualizations are not misleading is maybe even more important. Hence 
there are two obvious criteria for the quality of social network visualizations: 

1. Is the information manifest in the network represented accurately? 

2. Is this information conveyed efficiently? 

With these criteria in mind, the following three aspects should be carefully 
thought through when creating network visualizations E): 
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— the substantive aspect the viewer is interested in, 

~ the design (i.e. the mapping of data to graphical variables), and 

— the algorithm employed to realize the design (artifacts, efficiency, etc.). 

Depending on the context, actors of high structural importance are inter- 
preted as a being central or as having high status. With this substantive aspect 
in mind, we designed visualizations that represent vertex indices by constrain- 
ing vertex positions to fixed distances from the center or from the bottom of 
the drawing, in either case depending linearly on the vertex index. These ideas 
have been further developed and applied in the following projects. We also refer 
to PS] for a more detailed description and figures illustrating the results. 

Drug Policy. This project 1201 studies the presence of HIV-preventive measures 
for IV-drug users in nine selected German municipalities. The substantive ques- 
tion underlying this research is, why municipalities with comparable problem 
pressure differ significantly in the provision of HIV-preventive measures such as 
methadone substitution or needle exchange. The policy networks under scrutiny 
comprise all local organizations directly or indirectly involved in the provision 
of such measures. The actors included in the study were queried about relations 
such as strategic collaboration, common activities, or informal communication 
with other organizations in the same municipality. None of the networks has 
more than 120 edges of the same type, and typically more than 50% of them are 
unconfirmed. In a three-stage force-directed method for centrality layouts is 
presented, and in jO] a simple, purely combinatorial algorithm is developed. 

Industry Privatization. The second study deals with networks of public, 
societal and private organizations that developed during the privatization of in- 
dustrial conglomerates in East Germany as part of the economic transformation 
after German unification in 1990. Their privatization is understood as political 
bargaining processes between actors that are connected by ties such as exchange 
of resources, command, or consideration of interest. The privatization was fore- 
seen to be carried out by the Treuhandanstalt, a public agency of the federal 
government. Due to its institutional position and its ownership of all companies, 
it was generally assumed to be one of the most powerful actors in the trans- 
formation of East Germany. As part of the analysis, status indices are used as 
indicators for the power or influence of actors. In m a layered layout algorithm 
is outlined that visually supports status analyses of networks. A refinement of 
this algorithm uses the linear-time algorithm of HH for coordinate assignment. 

Topic Identification. Our third example illustrates the use of methods from 
social network analysis in another domain, namely topic identification in texts 
by centering resonance analysis m- The structure of texts is represented by 
graphs that have a vertex for each word occurring in a noun phrase and an 
edge for each pair of words that appear together in the same noun phrase or 
consecutively in the same sentence. It is argued that words corresponding to 
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nodes with high betweenness centrality in such a graph are important for the 
structure of the text and thus a proxy for its topic. This method was applied 
to Reuters news dealing with the terrorist attacks of September 11, 2001 [7| to 
identify, among other things, the main topics, topic changes, side stories, etc. in 
the news. Centrality visualizations can then be used to show for example the 
main topics identified for the very first day of media coverage. 



5 '^isOrie 

The '^isope software US] is implemented in CH — h using LEDA, the Library of 
Efficient Data Types and Algorithms | 22 |. While the user interface is a cus- 
tomized version of LEDA’s GraphWin class, all graph generation, analysis, and 
layout algorithms (except for LEDA’s force-directed layout routine) have been 
implemented from scratch. 

Starting with version 1.1, the main data format used in '^isope will be the 
XML sublanguage GraphML (Graph Markup Language) PIJ . GraphML support 
is implemented in a LEDA extension package which will be made available for 
public use. It will hence be possible to administer project files with several social 
networks and any number of attributes. Data attributes can be mapped freely 
to graphical attributes like color, shape, and so on. 
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